HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

hOCR preprocessor not available in latest release despite documentation suggesting othwerwise #529

Closed AmitPoonia closed 3 years ago

AmitPoonia commented 3 years ago

Hi I am a new user to Fonduer and trying to use hOCR preprocessor during the parsing, but apparently it can't be imported. I am using the latest release (v0.8.3), and the documentation for latest release tells that hOCR api is available (https://fonduer.readthedocs.io/en/latest/user/parser.html#fonduer.parser.preprocessors.HOCRDocPreprocessor).

So I think either I am missing something or the latest release don't have the hOCR functionality and api documentation is updated in advance. If thats the case then can you please tell me when the hOCR functionality is planned to be released? Thanks.

lukehsiao commented 3 years ago

That's correct, the latest release (v0.8.3) does not support hOCR. You can check this in our Changelog and notice that hOCR is under "Unreleased".

For v0.8.3's documentation, refer to the latest stable docs (https://fonduer.readthedocs.io/en/stable/dev/changelog.html). Or, if you want to try Fonduer with hOCR, you can install from the master branch directly.

AmitPoonia commented 3 years ago

Thank you.