Renovus-Tech / solarec-python

GNU Affero General Public License v3.0
0 stars 0 forks source link

Use an LLM API to extract onboarding information from unstructured raw text data #14

Closed fcggamou closed 7 months ago

fcggamou commented 7 months ago

Ticket Description:

Objective: Enable the extraction of onboarding information from unstructured raw text data using a Language Model (LLM) API. The goal is to integrate the API into the system to interpret user input, which may contain details such as location, capacity, and installation date related to solar installations. Upon processing the input text, the extracted data should be structured and presented in JSON format for further processing and storage.

Tasks:

  1. API Integration:

    • Research and select an appropriate Language Model API capable of extracting structured data from unstructured text inputs.
    • Obtain necessary API credentials and authentication tokens to enable integration with the project.
    • Integrate the selected LLM API into the system, allowing for the extraction of relevant information from user input text.
  2. Data Extraction:

    • Define the specific types of onboarding information to be extracted from user input (e.g., location, capacity, installation date).
    • Configure the LLM API to identify and extract the required data fields from unstructured text inputs.
    • Implement error handling mechanisms to address cases where certain information may not be extractable or ambiguous.
  3. Data Structuring:

    • Structure the extracted data into a standardized format, such as JSON, to ensure consistency and compatibility with downstream processes.
    • Define a schema or template for the JSON format, outlining the key-value pairs corresponding to different onboarding information categories.
    • Ensure that the structured data format facilitates easy parsing and integration with other system components.

Deliverables:

fcggamou commented 6 months ago

Added an abstract base class LLMClient, serving as a foundational structure for implementing specialized classes for different Language Model API integrations. One such specialized class, OpenAIClient, has been implemented, focusing on interacting with the LLM API provided by OpenAI.