Input image size doesn't match model

dhea1323 commented 2 years ago

Hello @NielsRogge and community,

I tried to combine microsoft/beit-base-patch16-224 and cahya/roberta-base-indonesian-1.5G just by changing the following code which is in Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_native_PyTorch.ipynb with my own dataset

encode = 'microsoft/beit-base-patch16-224'
decode = 'cahya/roberta-base-indonesian-1.5G'

feature_extractor=ViTFeatureExtractor.from_pretrained(encode)
tokenizer = RobertaTokenizer.from_pretrained(decode)
processor = TrOCRProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)

train_dataset = IAMDataset(root_dir='/path/to/dataset/',
                           df=train_df,
                           processor=processor)
eval_dataset = IAMDataset(root_dir='/path/to/dataset/',
                           df=test_df,
                           processor=processor)

model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(encode, decode)

The following error occurs: ValueError: Input image size (384*384) doesn't match model (224*224)

Is there anything else that must be adjusted? or Is there any mistake I did?, because when I try to run the file with my own dataset without changing anything, no error occurs.

Thank you.

dhea1323 commented 2 years ago

Already solved

archwolf118 commented 2 years ago

Already solvedAlready solved

How you solved this problem?

hokhaminh commented 2 years ago

Already solved

bro, how did you solve this ??

NielsRogge commented 2 years ago

You can check feature_extractor.size to see the size that will be used when resizing images. Note that a multimodal model like TrOCR consists of a feature extractor for preparing the images, and a tokenizer for preparing the text targets. A processor combines both.

NielsRogge / Transformers-Tutorials

Input image size doesn't match model #107