clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.74k stars 466 forks source link

Could not find image processor class in the image processor config or the model config. #276

Open felixnguyen258 opened 10 months ago

felixnguyen258 commented 10 months ago

Hi Donut team,

I trained "donut-base" model with my custom dataset for testing but I cannot do inference that it returned error.

################ Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.

You are using a model of type donut to instantiate a model of type vision-encoder-decoder. This is not supported for all configurations of models and can yield errors.

Traceback (most recent call last): File "C:\Users\Admin\projects\training\donut\inference.py", line 19, in processor, model = load_model(model_id) File "C:\Users\Admin\projects\training\donut\inference.py", line 16, in load_model model = VisionEncoderDecoderModel.from_pretrained(model_id) File "C:\Users\Admin\anaconda3\envs\aienv\lib\site-packages\transformers\models\vision_encoder_decoder\modeling_vision_encoder_decoder.py", line 363, in from_pretrained return super().from_pretrained(pretrained_model_name_or_path, *model_args, kwargs) File "C:\Users\Admin\anaconda3\envs\aienv\lib\site-packages\transformers\modeling_utils.py", line 2535, in from_pretrained config, model_kwargs = cls.config_class.from_pretrained( File "C:\Users\Admin\anaconda3\envs\aienv\lib\site-packages\transformers\configuration_utils.py", line 598, in from_pretrained return cls.from_dict(config_dict, kwargs) File "C:\Users\Admin\anaconda3\envs\aienv\lib\site-packages\transformers\configuration_utils.py", line 747, in from_dict config = cls(**config_dict) File "C:\Users\Admin\anaconda3\envs\aienv\lib\site-packages\transformers\models\vision_encoder_decoder\configuration_vision_encoder_decoder.py", line 85, in init raise ValueError(

ValueError: A configuraton of type donut cannot be instantiated because not both encoder and decoder sub-configurations are passed, but only {'_name_or_path': './models/donut-base', 'align_long_axis': False, 'architectures': ['DonutModel'], 'decoder_layer': 4, 'encoder_layer': [2, 2, 14, 2], 'input_size': [1280, 960], 'max_length': 768, 'max_position_embeddings': 768, 'model_type': 'donut', 'torch_dtype': 'float32', 'transformers_version': '4.33.3', 'window_size': 10, '_commit_hash': None} ################

Could you please give an advice for this case?

Thanks & Regards, Felix

benjaminfh commented 10 months ago

I have had this - I can't remember the exact cause but I'm fairly sure it's a transformers version issue. I'd first check that you're trying to load your model into an env with the same transformers version as the env you did training in. Also check version requirements in general - circa 4.25.1 works best in my experience.

I brain dumped my experience in python module version management here a few days ago. It might be useful: https://github.com/clovaai/donut/issues/132#issuecomment-1837611702