CatchTheTornado / pdf-extract-api

Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
https://demo.doctractor.com
GNU General Public License v3.0
1.33k stars 86 forks source link

[feat] Add MetaData LLM call #16

Open pkarw opened 2 weeks ago

pkarw commented 2 weeks ago

If we add another, optional LLM call - for example for getting the tags and summary of the file generated in the main LLM call - we could use these data as a naming strategy for the storage adapters - related to #10

If anyone is interested in making this feature happen - let me know I can specify the details