Open XFastDataLab opened 1 year ago
Nougat is designed for working with PDF files -- if you're trying to feed it an image and it expects a PDF that's where the issue may lie. If you use a scanning app like Adobe Scan, it will scan the page for you and it will look like a printer scan and it is a pdf. Feed that to Nougat.
If you still do not get a result/a good result that's because your image is not similar enough to the training data. E.g. if you get something like [Missing page: {numer}] then it's because Nougat didn't detect any content there which is similar enough to it's training data, or it skipped it because it thinks it's an empty page. That's what I understand of it as of right now.
If you only need to detect equations you might be better off with Lukas Blecher's LaTeXOCR library available here: https://github.com/lukas-blecher/LaTeX-OCR. Works like Mathpix Snip tool for equations.
It is feasible to directly process images. In data processing, the PDF is first converted into images before being input into the model. You can pay attention to the processing flow of the "LazyDataset" method and make corresponding modifications.
Actually, FWIW, why did you choose to use Arxiv papers as a dataset as opposed to generalizing certain features found in academic papers like text (in different languages), block-, and inline LaTeX equations, tables, etc., and then generate a dataset using a separate generator script as opposed to pulling down PDFs from Arxiv which may or may not contain said generalized characteristics...?
Generating the dataset based on a few generalized characteristics could maybe improve the quality of output across a) different languages and b) different subject domains, e.g. improved performance on a pure text PDF in the humanities, rather than (only) a STEM paper
Would this be an argument for fine-tuning?
I made two experiments. One is to save a page from a PDF paper to be an image, Nougat performs well to identifiy all words and equation. The second, I took a photo and convert it to be an image that is similar to be a page of a paper, then Nougat failed this time.
Did you do all the necessary processing before feeding it to nougat ?
I guess there is a little chance of it working well for images that doesn't looks like the training, in terms of quality and looks.
Did you do all the necessary processing before feeding it to nougat ?
What processing are you performing to increase accuracy of nougat?
I write an equation in a white paper, and take a photo by my phone, then use Nougat to extract equation, but I got nothing, what's the problem?