DataFog / datafog-python

Open source PII detection and anonymization tool: easy-to-use, configurable, and extensible
https://www.datafog.ai
MIT License
9 stars 2 forks source link

[Investigate] scope PDF parsing functionality #18

Closed sidmohan0 closed 5 months ago

sidmohan0 commented 5 months ago

This is still an active topic within RAG extraction space

TODO:

bazooka720 commented 5 months ago

This will be good one to have.

sidmohan0 commented 5 months ago

@bazooka720 we have a beta release (datafog==2.4.0b3) if you're interested in giving it a spin. Planning on releasing 2.4.0 tonight.

Main update:

sample usage: image

TODO: expose OCR/Image capabilities (which using Unstructured involves passing a 'high_res' param in for setting) - will try to fit that in for tonight's release.

Let me know if you have any questions/feedback and thanks for all the input!