OpenNyAI / Environment-Legal-Data-Helper

0 stars 0 forks source link

Implement Indexer and Retriever functionalities for given environment clearance data #2

Open KaranrajM opened 1 week ago

KaranrajM commented 1 week ago

Description

Strategize a suitable chunking technique to index the given environment clearance data, where each file contains a list of projects and their details. Additionally, implement a retriever that can perform the following actions:

Goal

To develop an information retrieval system specific to environment clearance data.

Expected Outcome

Acceptance Criteria

An information retrieval system specific to environment clearance data with high accuracy.

Implementation Details

  1. Implement a suitable and efficient chunking technique for the given dataset.
  2. Build a retriever that can:
    • Search and look up a project using location details (latitude, longitude, or State and City).
    • Search for specific details only within the project.
  3. Sample parsed and cleaned EC data can be found here. Their respective bare data can be found here.

Mockups/Wireframes

NOT APPLICABLE

Product Name

Jugalbandi

Organisation Name

OpenNyAI

Domain

Legal

Tech Skills Needed

Requisites

Complexity

Medium

Category

Backend

SudoSu-bham commented 1 day ago

I am not able to understand, whether the environmental clearance data is stored in a spreadsheet or is it stored in text form in some files or you already have some database of environmental clearance data?

where each file contains a list of projects and their details May you please elaborate?

KaranrajM commented 1 day ago

Hi @SudoSu-bham I have updated the issue with the drive link to the data. Basically both the issues in the repo are linked. The parsed data from the previous issue will be used to build and train RAG systems here. However I have given some already parsed EC data as an example for getting started. The link to their respective bare data files (for your reference) are also updated. Let me know if you have any other questions.