Hon-Wong / Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
https://hon-wong.github.io/Elysium/
59 stars 2 forks source link

T-selector #12

Closed qyc-98 closed 1 month ago

qyc-98 commented 1 month ago

Hi, thanks for your wonderful work! I'd like to ask how to implement the Gumbel softmax in Tselector for T-selector's training.( in this code) image

My implementations are:

image

Hon-Wong commented 1 month ago

Most are correct, but "hard" should be set as False, as indicated here

We simply replace the softmax operation with: video_token_scores = F.gumbel_softmax(video_token_logits, tau=1, hard=False)