Closed laurislopata closed 1 year ago
looks like the model fails for these. Try adding the flag --no-skipping
when calling nougat to see a partial output
Yes, I can see that it stops generating when I add the flag. I was wondering if are there any known areas where it might fail or it's not known very well yet what are the exact capabilities of it?
Hello,
sometimes when I try to parse a PDF I get one of these errors (after displaying it).
[MISSING_PAGE_FAIL:1]
[MISSING_PAGE_EMPTY:1]
do you know what could be causing this and what would be a good way to handle them? Most of the time the tool works perfectly!
this is the output of the run:
2023-09-25 21:05:08.089068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 0% 0/1 [00:00<?, ?it/s]INFO:root:Processing file /content/exam_2017_q19.pdf with 1 pages 100% 1/1 [00:04<00:00, 4.11s/it] 2023-09-25 21:05:26.087183: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 0% 0/1 [00:00<?, ?it/s]INFO:root:Processing file /content/exam_2020_q15.pdf with 1 pages 100% 1/1 [00:19<00:00, 19.95s/it]
and here are the 2 pdfs I'm trying to process exam 2020 q15.pdf exam 2017 q19.pdf