DARPA-ASKEM / terarium

https://app.terarium.ai
Apache License 2.0
13 stars 2 forks source link

[TASK]: Test Nougat locally #4417

Closed YohannParis closed 1 week ago

YohannParis commented 1 month ago

Describe the task

Task Items

bigglesandginger commented 1 month ago

Mac build

Tried building the python code on M1 Mac as torch maybe runs better on Apple Silicon now, but failed with a message about aiohttp

aiohttp/_websocket.c:196:12: fatal error: 'longintrepr.h' file not found

Maybe it is a wrong version of aiohttp, however, there is no reference to that in the nougat code base.

Docker build

On the Docker front I have tried to build the CUDA container but it fails due to

102.2 error: torch 2.0.1 is installed but torch<4.0,>=2.1.0 is required by {'lightning'}

I have tried changing setup.py to limit lightning to not its latest, but that doesn't appear to work. I will try to specify a torch version that pip installs as it does not appear to install the latest for "reasons".

YohannParis commented 1 month ago

Per @j2whiting recommendation, we could run the model by using huggingface.co

from huggingface_hub import hf_hub_download
import re
from PIL import Image

from transformers import NougatProcessor, VisionEncoderDecoderModel
from datasets import load_dataset
import torch

processor = NougatProcessor.from_pretrained("facebook/nougat-base")
model = VisionEncoderDecoderModel.from_pretrained("facebook/nougat-base")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# prepare PDF image for the model
filepath = hf_hub_download(repo_id="hf-internal-testing/fixtures_docvqa", filename="nougat_paper.png", repo_type="dataset")
image = Image.open(filepath)
pixel_values = processor(image, return_tensors="pt").pixel_values

# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
    pixel_values.to(device),
    min_length=1,
    max_new_tokens=30,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
)

sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = processor.post_process_generation(sequence, fix_markdown=False)
# note: we're using repr here such for the sake of printing the \n characters, feel free to just print the sequence
print(repr(sequence))
j2whiting commented 1 month ago

A very simple way to benchmark the model against our data is to generate the Latex, render as a PDF and then compare to the original PDF visually.

image
bigglesandginger commented 1 month ago

I find the best way is to use software that compiles

bigglesandginger commented 1 month ago

So, maybe CUDA is great on a Linux machine with a NVidia card, but as that is not what I have or access to (at home I have potential of a Windows box with a much more affordable and equally great AMD graphics card...) and the python code refuses to run on my mac for reasons of wrong chip type and the Docker container fails because Cu118 cannot load torch > 2.0.1 even though all the versions of torch are in the distribution directory, this is non starter software.

YohannParis commented 1 month ago

@YohannParis to bug @AndrewjUncharted to get ssh access to the ML box

AndrewjUncharted commented 1 week ago

One moment @YohannParis