District-Administration-Varanasi / court-judgement

2 stars 2 forks source link

Court Judgement Drafting: JSON Datastore, Closeness check and BenchMarking #13

Open AryanPrakhar opened 1 month ago

AryanPrakhar commented 1 month ago

Ticket Contents

  1. Create JSON schemas that capture the details of the sample court docs The goal is to consolidate all the important data points from the documents and create an exhaustive list of questions that could be asked of the user to start the drafting process of court orders. The information provided by the user here would serve as the basis for all further document generation steps or training steps. The schema must be separated into common data and unique data so that the questionnaire for the user can be structured correctly. #5

  2. Closeness check Evaluate if the generated documents are consistent with the original flow and structure of the documents. Implement techniques to quantitatively check the closeness between the original and generated documents.

  3. Benchmarking Test the system on variety of user responses, LLMs and operational techniques, identify the chokepoints and assess the quality of output under a spectrum of conditions.

Goals & Mid-Point Milestone

Goals

  1. Structure and Semantics Store

    • [ ] Create a schema containing generic information common to all the docs
    • [ ] Create a schema containing information that is relatively uncommon.
    • [ ] Create schema filled with information from the docs to be taken as example during few shot prompting.
    • [ ] Create schema of details which are frozen across the docs so that the user is not asked to fill those
  2. Closeness Evaluation

    • [ ] Study a variety of techniques suitable for checking the closeness of documents
    • [ ] Experiment with different techniques and document the results; Select the most appropriate algorithm or a mix of it.
    • [ ] Incorporate the closeness check algorithm in the pipeline
  3. Benchmarking

    • [ ] Integrate the system with a variety of LLMs both from open source and close source nature.
    • [ ] Extensively evaluate the performance of the system across different models quantitively.
    • [ ] Test your system's performance against adversarial examples, noisy data, or other forms of input that might cause errors.

Setup/Installation

No response

Expected Outcome

Acceptance Criteria

No response

Implementation Details

JSON Schemas as structure-semantics store

Closeness check

Benchmarking

Mockups/Wireframes

No response

Product Name

Court judgement drafting

Organisation Name

SamagraX

Domain

⁠Service Delivery

Tech Skills Needed

Machine Learning, Natural Language Processing, Python

Mentor(s)

@ChakshuGautam @GautamR-Samagra

Category

Machine Learning

AryanPrakhar commented 1 month ago

Weekly Learnings & Updates

Week 1

Week 2

Week 3

Week 4