VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
13.97k stars 707 forks source link

Error no file named pytorch_model.bin, model.safetensors #198

Open mrliangcb opened 2 weeks ago

mrliangcb commented 2 weeks ago

python: 3.9.19 torch:1.12.1 marker-pdf: 0.2.13

code : python convert.py doc_dir ouput

error info: Traceback (most recent call last): File "/root/marker/convert.py", line 135, in main() File "/root/marker/convert.py", line 111, in main model_lst = load_all_models() File "/root/marker/marker/models.py", line 76, in load_all_models texify = setup_texify_model(device, dtype) File "/root/marker/marker/models.py", line 37, in setup_texify_model texify_model = load_texify_model(checkpoint=settings.TEXIFY_MODEL_NAME, device=settings.TORCH_DEVICE_MODEL, dtype=settings.TEXIFY_DTYPE) File "/opt/conda/envs/py39/lib/python3.9/site-packages/texify/model/model.py", line 17, in load_model model = VisionEncoderDecoderModel.from_pretrained(checkpoint, config=config, torch_dtype=dtype) File "/opt/conda/envs/py39/lib/python3.9/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 371, in from_pretrained return super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) File "/opt/conda/envs/py39/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3305, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /root/.cache/huggingface/vikp/texify.

pranshuchaurasia commented 1 week ago

The error you're encountering is due to attempting to run the model in offline mode without having the necessary model files downloaded locally. To resolve this, you'll need to download the required models from Hugging Face (https://huggingface.co/vikp) , as well as models for the Surya and Texify libraries that Marker uses. Once downloaded, update the settings.py file in the Marker project to point to these locally downloaded model files. You'll need to do the same for the Surya and Texify libraries, updating their respective configuration files to use the local model paths. After making these changes, your script should be able to find and use the model files locally instead of attempting to download them.