Open hongyi-zhao opened 4 months ago
Try to downgrade to transformers 4.38.2 , I believe Transformers introduced this cache position somewhere around , 4.39.0 etc
This really fixed the problem reported here, but the result file generated by nougat is empty in my test, as shown below:
$ proxychains-ng-country-control uv pip install transformers==4.38.2
$ nougat 扫描全能王-2024-06-11-16.30.pdf -o .
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0%| | 0/1 [00:00<?, ?it/s][nltk_data] Error loading words: <urlopen error [Errno 111] Connection
[nltk_data] refused>
INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3.44it/s]
$ cat 扫描全能王-2024-06-11-16.30.mmd
$
Regards, Zhao
try --no-skipping as argument in the command like this
nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping
Still useless, as shown below:
(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0%| | 0/1 [00:00<?, ?it/s]INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.27s/it]
(datasci) werner@x13dai-t:~$ cat 扫描全能王-2024-06-11-16.30.mmd
\(\frac{1}{2}\).
## 5. Conclusion
In this paper, we have proposed a new method for the estimation of the \(\frac{1}{2}
is original pdf also 1 page
is original pdf also 1 page
Yes.
Then, I tried the following method, but the result was the same:
(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping -p 1
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
INFO:root:Skipping 扫描全能王-2024-06-11-16.30.pdf, already computed. Run with --recompute to convert again.
(datasci) werner@x13dai-t:~$ proxychains-ng-country-control nougat 扫描全能王-2024-06-11-16.30.pdf -o . -m 0.1.0-base --no-skipping -p 1 --recompute
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0%| | 0/1 [00:00<?, ?it/s]INFO:root:Processing file 扫描全能王-2024-06-11-16.30.pdf with 1 pages
100%|████████████████
![D9B580EAC044C1DB4774CB929C8F068C](https://github.com/facebookresearch/nougat/assets/11155854/fc5f9aa0-d8d9-4d03-a5b5-0598fe17be24)
███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.38s/it]
(datasci) werner@x13dai-t:~$ cat 扫描全能王-2024-06-11-16.30.mmd
\(\frac{1}{2}\).
## 5. Conclusion
In this paper, we have proposed a new method for the estimation of the \(\frac{1}{2}
The content of the pdf file is as follows:
yeah it is not full proof , if you have very less work to convert , I would suggest using Mathpix , it gives I think 10 pdf conversions, and much more accurate.
That's why it is not working , it is trained on arxiv research papers data , it is out of domain for this model.
In fact, I've tried Mathpix before asking here, but the results were equally unsatisfactory.
pip install transformers==4.38.2 pyarrow==14.0.1 requests==2.31.0 git+https://github.com/facebookresearch/nougat
@ivanmladek What do you mean? This will solve this issue here?
On Ubuntu 22.04.4 LTS, I tried to use nougat as follows but failed:
Then, I tried to install cuda as follows:
Download Installer for Linux Ubuntu 22.04 x86_64
The base installer is available for download below.
Base Installer
Installation Instructions:
Additional installation options are detailed here. Driver Installer
NVIDIA Driver Instructions (choose one option) To install the legacy kernel module flavor:
But the error is the same when running nougat for the above test.
Regards, Zhao