CatchTheTornado / pdf-extract-api

Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
https://demo.doctractor.com
GNU General Public License v3.0
1.33k stars 86 forks source link

Add S3 storage strategy #18

Closed chavan-arvind closed 4 days ago

chavan-arvind commented 2 weeks ago

Related to #15

Add S3 storage strategy for output files.

pkarw commented 2 weeks ago

Hey @chavan-arvind! Thanks for this PR isn't it a duplicate of #19?

pkarw commented 2 weeks ago

I think you should review #10 as it's definitely not a right way to define the storage strategy as ocr strategy. Please check PR #10 and try to apply the s3 strategy accordingly based on it ok?

I cannot accept this PR as it is right now due the architecture concerns