29th April : --- Breakdown into subparts and clean up of initial documents
13th May : Fully stitched up MVP which is able to ask questions to to user based on the document type
27th May: Evaluation framework fully setup for overall and individual parts of the system , test set of benchmarks created. all models setup as APIs interacting with one another
10th June: improvement on the individual components, support for Hindi for pdf breakdown into schema , adding language components to user interface.
24th June:
8th July:
22nd July:
Documented the discussion around project implementation strategy.
Took the task of working on ticket 1 and 3 that is 'Document Analysis and Section Building' and 'Closeness Evaluation'.
Researched on optimum methods to extract semantics from the documents, especially in the case of Hindi language.
The initial idea for semantics extraction was converting the orders to English and using Named Entity Recognition, Dependency Parsing and Semantic Role Labelling.
In my quest for solutions, I found out that LLMs do really well on the information retrieval task, even in the case of Hindi documents.
Read about storing retrieved information to json schema. Experimented upon this capability of LLMs and found out that this is indeed true and LLMs perform quite well on the information retrieval task.
Read the provide court documents. Identified the common components in these documents. Created a json schema for a baseline format for all the necessary information required to draft a court order.
Experimented the same with LLM, json schema was retrieved accurately.
Providing an example court order to the LLM, checked how accurately the LLM could generate a court order draft. Got great results.
Researching on the optimum measures for closeness of generated court orders. Read about Text distance but it won't be the best match for our case.
Instead, looking at the possibility of using ROGUE/BLUE, LLM based, sematic similarity matching etc upon the feedback of mentors
Jeet /Aryan :
29th April : --- Breakdown into subparts and clean up of initial documents 13th May : Fully stitched up MVP which is able to ask questions to to user based on the document type 27th May: Evaluation framework fully setup for overall and individual parts of the system , test set of benchmarks created. all models setup as APIs interacting with one another 10th June: improvement on the individual components, support for Hindi for pdf breakdown into schema , adding language components to user interface.
24th June: 8th July: 22nd July: