NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.45k stars 1.32k forks source link

TrOCR decoder_start_token should be `eos` instead of `cls`. #362

Open thariq-nugrohotomo opened 8 months ago

thariq-nugrohotomo commented 8 months ago

Using the pretrained model, when I pass cls or bos as the initial decoder token, the output (first decoded token) rarely get correct. But once I try to use eos, the output is correct, or at least similar with the output returned by model.generate().

In the official code from Microsoft, they will fallback to eos if the token is not specified https://github.com/microsoft/unilm/blob/6f60612e7cc86a2a1ae85c47231507a587ab4e01/trocr/generator.py#L84

Code excerpt to manually see the first decoded token:

decoder_start_token_id = processor.tokenizer.eos_token_id # processor.tokenizer.bos_token_id 
x = model(pixel_values, torch.tensor([[decoder_start_token_id]]))
x = x.logits
x = torch.argmax(x, -1)
print(processor.tokenizer.batch_decode(x))

Switch eos_token_id to bos_token_id then observe the different output.

review-notebook-app[bot] commented 8 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB