Clarification on the Nougat Transformers

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

MIT License

8.45k stars 1.32k forks source link

Clarification on the Nougat Transformers #354

Open Dipankar1997161 opened 8 months ago

Dipankar1997161 commented 8 months ago

@NielsRogge, thanks for the tutorials, I am particularly interested in the Nougat one and have a question.

Nougat can provide the text extraction from the pdfs, I was wondering, can it extract Tables/structured data and Images too? Have you tried this. By images, I don't mean pdfs as images but rather the images within the pdfs.

My end goal is to store the extracted data separately into 3 sections, 1. Text 2. Images 3. Tables Would love to hear on this matter from you.

NielsRogge commented 8 months ago

Yes, you can train a Nougat/Donut model which takes in images of tables and generates the corresponding content in key-value pairs. You just need a high quality dataset of (table image, table content) pairs.

Dipankar1997161 commented 8 months ago

Yes, you can train a Nougat/Donut model which takes in images of tables and generates the corresponding content in key-value pairs. You just need a high quality dataset of (table image, table content) pairs.

I mean, Nougat was specifically designed for academic papers right? So it should have been altrady tained to extract "Structured Data" since any research paper will contain Tables and text together??

NielsRogge commented 8 months ago

Yes you could fine-tune Nougat on additional data, you could benefit from Nougat's pre-training.

AbdulDD commented 3 months ago

@NielsRogge does it work for docVQA task?