DanielZambSB / Equipo-IA

0 stars 0 forks source link

Feature: Extract info from emails inquirys #17

Closed DanielZambSB closed 2 months ago

DanielZambSB commented 5 months ago

User Role:

User Stories:

Acceptance Criteria:

  1. Additional Notes or Context:
    • Emails can contains a variety of formats including image, so it will be well asvised to standirize the fomat for inquiry over email, and also, an OCR process should be done over the images and pdf documents that are attached to every email.
DanielZambSB commented 5 months ago

Question: if the email does not have a date limit for the quotation, when to assign the quotation visit?

DanielZambSB commented 5 months ago

In emails with only an image, some fields are missing:

The Fields that the OCR process is covering:

DanielZambSB commented 4 months ago

remember to empty the string on city, on create inquiry

DanielZambSB commented 2 months ago

fixed problem on regex pattern, when an email is forwarded the formatting of the text changes, so, pattern has to change too. fixed JSON response linebreak problem on the response from the LLM, mainly using regex to remove unwanted linebreak that were not escaped.

DanielZambSB commented 2 months ago

beware this feature could break if the formatting of the emails changes in some way, so it wise to think of another method to parse the information of email and feed it to current backend structure

DanielZambSB commented 2 months ago

Code Documentation

Related Files:

Related Functions:

Code Changes Proposed:

Documentation for parseEmailTable Function

The parseEmailTable function is a critical component designed for parsing unread emails from a Gmail inbox that contain specific subject lines related to invitations for quotations. The function operates through the following sequence of tasks:

  1. Initializes arrays to store email data and logs.
  2. Searches for unread emails with two possible subject variations related to quotation invitations.
  3. Consolidates these email threads into a single array for processing.
  4. Iterates through each email message in these threads:
    • Retrieves and logs message attributes like subject, body, and date.
    • Extracts the relevant content from the email body using predefined cut-off strings to handle forwarded and original email formats differently.
    • Applies a set of regex patterns to extract specific data fields such as order number, location, and description from the email content.
    • If the essential fields are not present or insufficient data is captured, the function calls another function to process the email through an AI-based parsing tool (requestBisonEmailtable), which is indicated by a special flag in the result.
  5. Marks each processed email as read.
  6. Logs various attributes for each email, capturing both successfully parsed data and any anomalies or special cases.
  7. Returns two arrays: one containing the parsed email information and another with detailed logs of the processing for each email.

This function is vital for automating the extraction and processing of email data for subsequent operations, such as database entries or further administrative processing.

Documentation for requestBisonEmailtable Function

The requestBisonEmailtable function is designed to interact with a machine learning model hosted on Google's AI Platform to process text data and extract specific information in a structured format. This function primarily handles the automation of data extraction from maintenance request emails, which involves the following steps:

  1. Resets any previous session or state to ensure a clean operation.
  2. Initializes a service that handles authentication for Google Cloud services.
  3. Constructs the API endpoint URL using predefined project and model identifiers.
  4. Prepares a payload that describes the task for the AI model, which involves analyzing the text to identify and extract details such as order numbers, locations, descriptions, and contact information.
  5. Sets the request headers to include content type and authorization token.
  6. Sends the payload to the AI model's endpoint using the HTTP POST method.
  7. Catches and logs any errors encountered during the HTTP request.
  8. Parses and logs the response from the AI model, which includes the extracted information in a structured JSON format.

This function is essential for enhancing the efficiency of processing and organizing email content by leveraging AI capabilities to automate the extraction of critical data, which is then used in subsequent administrative processes.

Documentation for addEmailInfoToEmailTableDB Function

The function addEmailInfoToEmailTableDB is designed to process email information parsed from an email table and append this data to a specified Google Spreadsheet, as well as log specific actions or events. This function primarily involves two main tasks: appending rows to the 'Consolidado TablaEmail' sheet in a Google Spreadsheet, and inserting inquiry details into a database. The function operates as follows:

  1. Opens a specific Google Spreadsheet using its ID.
  2. Calls the parseEmailTable function to retrieve arrays of email data (emailArr) and logs (loggerArr).
  3. Iterates through each email entry:
    • Logs each email.
    • Constructs a row from the email's properties.
    • If the email has been parsed by an AI (aiParsedThis is present), the entire email data is appended as a new row. An inquiry object is then created with detailed fields initialized from the email data and subsequently sent to the database via the createInquiryDB function.
    • If the email hasn't been parsed by AI, modifies the data row and appends it. Another form of the inquiry object is created with a different structure and also pushed to the database.
  4. Iterates through each log entry and logs them using Logger.log, and then calls createLoggerEmail to store log information.

This function is crucial for handling batch email data processing and updates, making it an essential part of the system's backend infrastructure for email management and logging activities.

Documentation for processFolderFiles and processResultsIntoObjects Functions

processFolderFiles Function

The processFolderFiles function is designed to automate the processing of documents stored in a Google Drive folder, extract textual content using OCR (Optical Character Recognition), and integrate results into a Google Spreadsheet. Here’s how it functions:

  1. Initialization:

    • Connects to a specific Google Spreadsheet and selects a sheet named "Consolidado OCR" to store the processed data.
    • Retrieves files from a designated Google Drive folder.
  2. File Processing Loop:

    • Iterates through each file, logs the file name, and determines its MIME type.
    • Performs OCR on the file content using the processDocument function, which likely handles the conversion of image-based text into editable text.
    • Integrates the OCR result with additional predefined text data and processes it further using the requestBisonEmailImage function, presumably to parse structured data from the combined text.
  3. Data Handling:

    • Cleans up the OCR and processed results to remove unwanted formatting.
    • Parses the cleaned result into a JavaScript object and constructs a data row that combines various data points including timestamps, file names, original text snippets, OCR results, and the parsed structured data.
    • Appends the constructed row to the spreadsheet for record-keeping.
    • Constructs an inquiry object based on the parsed data and inserts it into a database via createInquiryDB.
  4. Data Management:

    • Ensures the data integrity and consistency throughout the processing, logging, and storage phases, facilitating traceability and further analysis of processed documents.

processResultsIntoObjects Function

The processResultsIntoObjects function is aimed at retrieving processed results from a Google Spreadsheet and reformatting them for consistency and further usage:

  1. Data Retrieval:

    • Connects to the same "Consolidado OCR" sheet in the Google Spreadsheet.
    • Retrieves a specific range of OCR results previously stored in the spreadsheet.
  2. Result Parsing and Formatting:

    • Iterates over the retrieved OCR results, parsing each JSON-formatted string into an object.
    • Checks for the presence of specific fields (e.g., "NotaAdicional"), adding default values if certain fields are missing to maintain data structure integrity.
    • Converts each parsed object into an array of values, preparing them for bulk spreadsheet operations.
  3. Spreadsheet Update:

    • Constructs a matrix of result values from the parsed data.
    • Updates a specified range in the spreadsheet with the newly formatted data, effectively normalizing the data presentation and making it more accessible for users or subsequent processes.

These functions collectively enhance the automation of document handling and data processing within a digital environment, streamlining operations related to document management, OCR processing, and data integration into structured formats for business or operational needs.

DanielZambSB commented 2 months ago

Image

Image

Image

Image