NormXU / nougat-latex-ocr

Codebase for fine-tuning / evaluating nougat-based image2latex generation models
https://arxiv.org/abs/2308.13418
Apache License 2.0
115 stars 13 forks source link
image-to-text

Nougat-LaTeX-OCR

Nougat-LaTeX-based is fine-tuned from facebook/nougat-base with im2latex-100k to boost its proficiency in generating LaTeX code from images. Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and uses an adaptive padding approach to ensure that equation image segments in the wild are resized to closely match the resolution of the training data. Download the model here 👈🏻.

Evaluation

Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and im2latex-100k, curated by lukas-blecher

model token_acc ↑ normed edit distance ↓
pix2tex 0.5346 0.10312
pix2tex* 0.60 0.10
nougat-latex-based 0.623850 0.06180

pix2tex is a ResNet + ViT + Text Decoder architecture introduced in LaTeX-OCR.

**pix2tex*: reported from LaTeX-OCR; pix2tex: my evaluation with the released checkpoint ; nougat-latex-based**: evaluated on results generated with beam-search strategy.

Uses

fine-tune on your customized dataset

  1. Prepare your dataset in this format
  2. Change config/base.yaml
  3. Run the training script
    python tools/train_experiment.py --config_file config/base.yaml --phase 'train'

predict

  1. Download the model
  2. Install dependency
    pip install -r all_requirements.txt
  3. You can find an example in examples folder
    python examples/run_latex_ocr.py --img_path "examples/test_data/eq1.png"

QA

please consider leaving me a star if you find this repo helpful :)