facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.98k stars 567 forks source link

zipfile.BadZipFile: File is not a zip file #57

Closed huangwei2913 closed 1 year ago

huangwei2913 commented 1 year ago

/home/huangwei/.local/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 0%| | 0/3 [00:07<?, ?it/s] Traceback (most recent call last): File "/home/huangwei/.local/bin/nougat", line 8, in sys.exit(main()) File "/home/huangwei/.local/lib/python3.8/site-packages/predict.py", line 130, in main model_output = model.inference(image_tensors=sample) File "/home/huangwei/.local/lib/python3.8/site-packages/nougat/model.py", line 653, in inference output["predictions"] = postprocess( File "/home/huangwei/.local/lib/python3.8/site-packages/nougat/postprocessing.py", line 504, in postprocess return [ File "/home/huangwei/.local/lib/python3.8/site-packages/nougat/postprocessing.py", line 505, in postprocess_single(s, markdown_fix=markdown_fix) for s in generation File "/home/huangwei/.local/lib/python3.8/site-packages/nougat/postprocessing.py", line 435, in postprocess_single if last_word in words.words(): File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/corpus/util.py", line 121, in getattr self.load() File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/corpus/util.py", line 81, in load root = nltk.data.find(f"{self.subdir}/{self.__name}") File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/data.py", line 555, in find return find(modified_name, paths) File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/data.py", line 542, in find return ZipFilePathPointer(p, zipentry) File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/compat.py", line 41, in _decorator return init_func(*args, *kwargs) File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/data.py", line 394, in init zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile)) File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/compat.py", line 41, in _decorator return init_func(args, kwargs) File "/home/huangwei/.local/lib/python3.8/site-packages/nltk/data.py", line 935, in init zipfile.ZipFile.init(self, filename) File "/usr/lib/python3.8/zipfile.py", line 1269, in init self._RealGetContents() File "/usr/lib/python3.8/zipfile.py", line 1336, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

lukas-blecher commented 1 year ago

looks like nltk tries to read a broken file. Try

import nltk
nltk.download("words", force=True)