Closed NareshGuru77 closed 1 month ago
The following line in git_det_head, skips the last token (height) for input tokens:
input_tokens = targets_tokens[:, :-1]
This leads to seq_embed skipping height during training. Is this intended? If so, could you please clarify why this is done?
Thank you!.
We use autoregressive training, so the last token is not needed. The second-to-last token will predict the last token.
Hi. Thank you for your quick response and clarification!.
The following line in git_det_head, skips the last token (height) for input tokens:
input tokens for parallel training
This leads to seq_embed skipping height during training. Is this intended? If so, could you please clarify why this is done?
Thank you!.