Nougat-LaTeX-based is fine-tuned from facebook/nougat-base with im2latex-100k to boost its proficiency in generating LaTeX code from images. Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and uses an adaptive padding approach to ensure that equation image segments in the wild are resized to closely match the resolution of the training data. Download the model here 👈🏻.
Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and im2latex-100k, curated by lukas-blecher
model | token_acc ↑ | normed edit distance ↓ |
---|---|---|
pix2tex | 0.5346 | 0.10312 |
pix2tex* | 0.60 | 0.10 |
nougat-latex-based | 0.623850 | 0.06180 |
pix2tex is a ResNet + ViT + Text Decoder architecture introduced in LaTeX-OCR.
**pix2tex*: reported from LaTeX-OCR; pix2tex: my evaluation with the released checkpoint ; nougat-latex-based**: evaluated on results generated with beam-search strategy.
config/base.yaml
python tools/train_experiment.py --config_file config/base.yaml --phase 'train'
pip install -r all_requirements.txt
python examples/run_latex_ocr.py --img_path "examples/test_data/eq1.png"
Q: Why did you copy and place the image_processor_nougat.py
file in the repository rather than simply importing it from the transformers
library if there are no changes compared to the one in huggingface/transformers
?
A: transformers 4.34.0
is the first version that natively supports the nougat. However, there is a bug in the nougat processor within this version, which can result in a run failure. You can review the details of this issue here. Fortunately, the developers have already addressed this bug, and I anticipate that you will be able to directly import it from transformers
in the next released version.
please consider leaving me a star if you find this repo helpful :)