CDCgov / IDWA

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

SPIKE: Research the use of offline LLM for PDF data extraction #6

Closed zdeveloper closed 5 months ago

zdeveloper commented 5 months ago

Research the use of offline LLM, namely LLAMA2 for data extraction from fillable PDF, and potentially from non fillable scanned PDF

Acceptance Criteria Please write up a spike doc and save it to the drive and present the findings either in slack or as a techtalk in the dev sync meeting.

Additional context https://huggingface.co/docs/transformers/en/model_doc/llama2

2 spike findings doc