TrOCR decoder_start_token should be `eos` instead of `cls`.

Using the pretrained model, when I pass cls or bos as the initial decoder token, the output (first decoded token) rarely get correct. But once I try to use eos, the output is correct, or at least similar with the output returned by model.generate().

In the official code from Microsoft, they will fallback to eos if the token is not specified https://github.com/microsoft/unilm/blob/6f60612e7cc86a2a1ae85c47231507a587ab4e01/trocr/generator.py#L84

Code excerpt to manually see the first decoded token:

decoder_start_token_id = processor.tokenizer.eos_token_id # processor.tokenizer.bos_token_id 
x = model(pixel_values, torch.tensor([[decoder_start_token_id]]))
x = x.logits
x = torch.argmax(x, -1)
print(processor.tokenizer.batch_decode(x))

Switch eos_token_id to bos_token_id then observe the different output.

NielsRogge / Transformers-Tutorials

TrOCR decoder_start_token should be `eos` instead of `cls`. #362