Gautam-Rajeev commented 6 months ago

Jeet /Aryan :

29th April : --- Breakdown into subparts and clean up of initial documents 13th May : Fully stitched up MVP which is able to ask questions to to user based on the document type 27th May: Evaluation framework fully setup for overall and individual parts of the system , test set of benchmarks created. all models setup as APIs interacting with one another 10th June: improvement on the individual components, support for Hindi for pdf breakdown into schema , adding language components to user interface.
24th June: 8th July: 22nd July:

Grinzypino commented 6 months ago

Weekly Learnings & Updates (Jeet)

Week 1 & 2:

The Documentation of all the 6 tickets were done and along with this new flowcharts were made for few unclear ones.
Researched for the methods on figuring out a information sufficiency check system for the copilot.
Methods like decision trees and hard coded questions to ask were discarded and using LLMs for it was chosen.
Further moved towards building a flask app copilot that will use LLM for info sufficiency check.

AryanPrakhar commented 6 months ago

Weekly learning & Updates (Aryan)

Week 1 & 2:

Documented the discussion around project implementation strategy.
Took the task of working on ticket 1 and 3 that is 'Document Analysis and Section Building' and 'Closeness Evaluation'.
Researched on optimum methods to extract semantics from the documents, especially in the case of Hindi language.
The initial idea for semantics extraction was converting the orders to English and using Named Entity Recognition, Dependency Parsing and Semantic Role Labelling.
In my quest for solutions, I found out that LLMs do really well on the information retrieval task, even in the case of Hindi documents.
Read about storing retrieved information to json schema. Experimented upon this capability of LLMs and found out that this is indeed true and LLMs perform quite well on the information retrieval task.
Read the provide court documents. Identified the common components in these documents. Created a json schema for a baseline format for all the necessary information required to draft a court order.
Experimented the same with LLM, json schema was retrieved accurately.
Providing an example court order to the LLM, checked how accurately the LLM could generate a court order draft. Got great results.
Researching on the optimum measures for closeness of generated court orders. Read about Text distance but it won't be the best match for our case.
Instead, looking at the possibility of using ROGUE/BLUE, LLM based, sematic similarity matching etc upon the feedback of mentors

District-Administration-Varanasi / court-judgement

Timelines #8

Weekly Learnings & Updates (Jeet)

Week 1 & 2:

Weekly learning & Updates (Aryan)

Week 1 & 2: