Closed YohannParis closed 1 week ago
Tried building the python code on M1 Mac as torch maybe runs better on Apple Silicon now, but failed with a message about aiohttp
aiohttp/_websocket.c:196:12: fatal error: 'longintrepr.h' file not found
Maybe it is a wrong version of aiohttp, however, there is no reference to that in the nougat code base.
On the Docker front I have tried to build the CUDA container but it fails due to
102.2 error: torch 2.0.1 is installed but torch<4.0,>=2.1.0 is required by {'lightning'}
I have tried changing setup.py
to limit lightning
to not its latest, but that doesn't appear to work. I will try to specify a torch version that pip
installs as it does not appear to install the latest for "reasons".
Per @j2whiting recommendation, we could run the model by using huggingface.co
from huggingface_hub import hf_hub_download
import re
from PIL import Image
from transformers import NougatProcessor, VisionEncoderDecoderModel
from datasets import load_dataset
import torch
processor = NougatProcessor.from_pretrained("facebook/nougat-base")
model = VisionEncoderDecoderModel.from_pretrained("facebook/nougat-base")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# prepare PDF image for the model
filepath = hf_hub_download(repo_id="hf-internal-testing/fixtures_docvqa", filename="nougat_paper.png", repo_type="dataset")
image = Image.open(filepath)
pixel_values = processor(image, return_tensors="pt").pixel_values
# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
pixel_values.to(device),
min_length=1,
max_new_tokens=30,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
)
sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = processor.post_process_generation(sequence, fix_markdown=False)
# note: we're using repr here such for the sake of printing the \n characters, feel free to just print the sequence
print(repr(sequence))
A very simple way to benchmark the model against our data is to generate the Latex, render as a PDF and then compare to the original PDF visually.
I find the best way is to use software that compiles
So, maybe CUDA is great on a Linux machine with a NVidia card, but as that is not what I have or access to (at home I have potential of a Windows box with a much more affordable and equally great AMD graphics card...) and the python code refuses to run on my mac for reasons of wrong chip type and the Docker container fails because Cu118 cannot load torch > 2.0.1 even though all the versions of torch are in the distribution directory, this is non starter software.
@YohannParis to bug @AndrewjUncharted to get ssh access to the ML box
One moment @YohannParis
Describe the task
Task Items