NanoNets / ocr-python

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
https://nanonets.com
MIT License
71 stars 11 forks source link

Feature Request : Text and Table extraction #1

Open a1012 opened 1 year ago

a1012 commented 1 year ago

Hi @karan-nanonets Currently, I am working on project where I need to extract text and tables in the page sequence from PDFS and give the combined input to another model . I am able to extract the tables only. Is there any way nanonet can do both and extract the PDF as it? like convert_to_text can extract text and table in tabular format(like convert_to_csv does it). Please let me know at the earliest.

karan-nanonets commented 1 year ago

Hey Aayushi This is definitely possible.

One of our AI experts can help you by creating a custom model with both text and table extraction, along with support for line item extraction. If you are interested, you can find time here - nanonets.com/call

Regards Karan

On Wed, Mar 15, 2023 at 2:38 PM Aayushi Gupta @.***> wrote:

Hi @karan-nanonets https://github.com/karan-nanonets Currently, I am working on project where I need to extract text and tables in the page sequence from PDFS and give the combined input to another model . I am able to extract the tables only. Is there any way nanonet can do both and extract the PDF as it? like convert_to_text can extract text and table in tabular format(like convert_to_csv does it). Please let me know at the earliest.

— Reply to this email directly, view it on GitHub https://github.com/NanoNets/ocr-python/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYQJ5DM23ANMDVBXPQ6ZWHLW4GBIXANCNFSM6AAAAAAV3QJXOE . You are receiving this because you were mentioned.Message ID: @.***>