Closed tranhoangnguyen03 closed 1 year ago
Can you share an example file so I can try and reproduce it?
Missing title and abstract. Below is the article I used. Is the model base or some formulas not recognized MCA.pdf
When I run nougat on this test file, on colab (A100, high mem), I get a CUDA out of memory error.
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0% 0/2 [00:17<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/nougat", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/predict.py", line 143, in main
model_output = model.inference(image_tensors=sample, early_stopping=args.skipping)
File "/usr/local/lib/python3.10/dist-packages/nougat/model.py", line 589, in inference
decoder_output = self.decoder.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1602, in generate
return self.greedy_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2450, in greedy_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mbart/modeling_mbart.py", line 1850, in forward
outputs = self.model.decoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mbart/modeling_mbart.py", line 1108, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mbart/modeling_mbart.py", line 424, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mbart/modeling_mbart.py", line 211, in forward
value_states = torch.cat([past_key_value[1], value_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 39.56 GiB total capacity; 17.41 GiB already allocated; 30.56 MiB free; 37.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I'm running into a similar issue. I'm not getting the [MISSING_PAGE_FAIL:1]
error, but the first page is missing from the extraction.
EDIT: It works with --model 0.1.0-base
@lukas-blecher @maxbeaudoin
nougat example.pdf -o output/ -m 0.1.0-base --no-skipping
Process runs with GPU load but it hangs and stuck What to do ?
I consistently see this message
[MISSING_PAGE_FAIL:1]
at the top of every pdf I parse. And consistently, the first page is skipped somehow.I am running the test on Google Colab using this this command
!nougat "path/to/file" -o output_directory