Eviaiy / Extract-Food-Facts

This tool uses LLM and OCR to extract nutrition information from food images, converting it into structured JSON for easy analysis by data enthusiasts.
MIT License
4 stars 0 forks source link

output classification #1

Open Eviaiy opened 4 months ago

Eviaiy commented 4 months ago

Given our interest in manipulating the recognized data, we can consider developing a classifier that categorizes and converts text files into JSON format. This involves a process where data initially presented in table format is transformed into a text file, which is then further converted into a JSON file.

Eviaiy commented 4 months ago

Creating an automated system that converts tabular data from text files into JSON format involves several steps, each of which can be approached in different ways depending on the complexity and variability of the data. Here are some strategies you can consider:

  1. Rule-Based Parsing: Regular Expressions: Craft specific regular expressions to match and capture the structure of the data. This works well if the data follows a consistent pattern.

  2. Natural Language Processing (NLP): Named Entity Recognition (NER): Use NLP to identify and classify the entities in the text (e.g., "Energy" as a category and "2081 kJ / 497 kcal" as a value).

  3. Machine Learning Models: Custom Classifier: Train a classifier to identify parts of the text that correspond to different categories of the table. Sequence Labeling: Implement a sequence-to-sequence model like LSTM or BERT to tag parts of the sequences with appropriate labels (e.g., B-category, I-value) indicating the beginning and inside of a category or value.

  4. OCR with Built-in Structuring: Advanced OCR Solutions: Some OCR tools provide structured outputs that identify tables and lists (e.g., Google Cloud Vision API, Amazon Textract).

  5. Hybrid Approaches: Combine rule-based and ML-based approaches where rules handle standard cases and ML handles edge cases.