facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.94k stars 567 forks source link

TypeError: BARTDecoder.prepare_inputs_for_inference() got an unexpected keyword argument 'cache_position' #228

Open hongyi-zhao opened 4 months ago

hongyi-zhao commented 4 months ago

On Ubuntu 22.04.4 LTS, I tried to use nougat as follows but failed:

$ pyenv shell datasci
(datasci) werner@x13dai-t:~$ cd "/home/werner/Public/repo/github.com/facebookresearch/nougat.git" && pip install -e .
(datasci) werner@x13dai-t:~$ nougat 'Public/hpc/servers/gpu/多种平台运行VASP、Quantum Espresso、Lammps和DeePMD-kit的性能测试/扫描全能王 2024-06-11 16.30.pdf' -o .
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                           | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/werner/.pyenv/versions/datasci/bin/nougat", line 33, in <module>
    sys.exit(load_entry_point('nougat-ocr', 'console_scripts', 'nougat')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/werner/Public/repo/github.com/facebookresearch/nougat.git/predict.py", line 167, in main
    model_output = model.inference(
                   ^^^^^^^^^^^^^^^^
  File "/home/werner/Public/repo/github.com/facebookresearch/nougat.git/nougat/model.py", line 592, in inference
    decoder_output = self.decoder.model.generate(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/transformers/generation/utils.py", line 1758, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/transformers/generation/utils.py", line 2394, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BARTDecoder.prepare_inputs_for_inference() got an unexpected keyword argument 'cache_position'

Then, I tried to install cuda as follows:

Download Installer for Linux Ubuntu 22.04 x86_64

The base installer is available for download below.

Base Installer
Installation Instructions:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda-repo-ubuntu2204-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-5

Additional installation options are detailed here. Driver Installer
NVIDIA Driver Instructions (choose one option) To install the legacy kernel module flavor:

#I use this method:
sudo apt-get install -y cuda-drivers

But the error is the same when running nougat for the above test.

Regards, Zhao

sparsh35 commented 4 months ago

Try to downgrade to transformers 4.38.2 , I believe Transformers introduced this cache position somewhere around , 4.39.0 etc

hongyi-zhao commented 4 months ago

This really fixed the problem reported here, but the result file generated by nougat is empty in my test, as shown below:

$ proxychains-ng-country-control uv pip install transformers==4.38.2
$ nougat 扫描全能王-2024-06-11-16.30.pdf -o .
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                           | 0/1 [00:00<?, ?it/s][nltk_data] Error loading words: <urlopen error [Errno 111] Connection
[nltk_data]     refused>
INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.44it/s]

$ cat 扫描全能王-2024-06-11-16.30.mmd
$ 

扫描全能王-2024-06-11-16.30.zip

Regards, Zhao

sparsh35 commented 4 months ago

try --no-skipping as argument in the command like this

nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping

hongyi-zhao commented 4 months ago

Still useless, as shown below:

(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                         | 0/1 [00:00<?, ?it/s]INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.27s/it]

(datasci) werner@x13dai-t:~$ cat 扫描全能王-2024-06-11-16.30.mmd
\(\frac{1}{2}\).

## 5. Conclusion

In this paper, we have proposed a new method for the estimation of the \(\frac{1}{2}

扫描全能王-2024-06-11-16.30.mmd.zip

sparsh35 commented 4 months ago

is original pdf also 1 page

hongyi-zhao commented 4 months ago

is original pdf also 1 page

Yes.

Then, I tried the following method, but the result was the same:

(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping -p 1
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:root:Skipping 扫描全能王-2024-06-11-16.30.pdf, already computed. Run with --recompute to convert again.
(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping -p 1 --recompute
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                           | 0/1 [00:00<?, ?it/s]INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|████████████████
![D9B580EAC044C1DB4774CB929C8F068C](https://github.com/facebookresearch/nougat/assets/11155854/fc5f9aa0-d8d9-4d03-a5b5-0598fe17be24)
███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.38s/it]

(datasci) werner@x13dai-t:~$ cat 扫描全能王-2024-06-11-16.30.mmd
\(\frac{1}{2}\).

## 5. Conclusion

In this paper, we have proposed a new method for the estimation of the \(\frac{1}{2}

The content of the pdf file is as follows:

D9B580EAC044C1DB4774CB929C8F068C

sparsh35 commented 4 months ago

yeah it is not full proof , if you have very less work to convert , I would suggest using Mathpix , it gives I think 10 pdf conversions, and much more accurate.

sparsh35 commented 4 months ago

That's why it is not working , it is trained on arxiv research papers data , it is out of domain for this model.

hongyi-zhao commented 4 months ago

In fact, I've tried Mathpix before asking here, but the results were equally unsatisfactory.

ivanmladek commented 3 months ago

pip install transformers==4.38.2 pyarrow==14.0.1 requests==2.31.0 git+https://github.com/facebookresearch/nougat

hongyi-zhao commented 3 months ago

@ivanmladek What do you mean? This will solve this issue here?