Haiyang-W / GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

https://arxiv.org/abs/2403.09394

Apache License 2.0

293 stars 12 forks source link

Closed NareshGuru77 closed 1 month ago

NareshGuru77 commented 1 month ago

The following line in git_det_head, skips the last token (height) for input tokens:

input tokens for parallel training

    input_tokens = targets_tokens[:, :-1]

This leads to seq_embed skipping height during training. Is this intended? If so, could you please clarify why this is done?

Thank you!.

nnnth commented 1 month ago

We use autoregressive training, so the last token is not needed. The second-to-last token will predict the last token.

NareshGuru77 commented 1 month ago

Hi. Thank you for your quick response and clarification!.