Inference on pretrained microsoft/git-base with custom tokenizer

Hi, I followed the tutorial on how to train microsoft/git-base on my own dataset. I have 2 milion images so i takes a lot to train even one epoch. BTW. how many epochs is good for this model? I'm trying to run inference on the trained model using model.generate but one every image I get the same output, so I thought maybe I should try running inference using pure pytorch but I get this error:

ValueError: You have to specify either input_ids or inputs_embeds

But on inference I should only use pixel_values so I don't understand why I get this error. I have trained the model this way, it seems kinda stiched:

processor = AutoProcessor.from_pretrained("microsoft/git-base")
tokenizer = RobertaTokenizer.from_pretrained("./Tokenizer/")

encoding = processor(images=item_image, max_length=max_len, padding="max_length", return_tensors="pt")
token = tokenizer(item_smile, padding='max_length', max_length=max_len, truncation=True,return_tensors='pt')
encoding.update(token)

NielsRogge / Transformers-Tutorials

Inference on pretrained microsoft/git-base with custom tokenizer #269