IDEA-Research / T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/home
Other
1.98k stars 120 forks source link

About the training process of T-rex2 #62

Closed hengseuer closed 1 week ago

hengseuer commented 1 month ago

Hello,

I have a question about the training process of T-rex2. Does T-rex2 first train the text prompts and then train both the text and visual prompts in successive iterations?

Thank you!

Mountchicken commented 1 month ago

Hi @hengseuer Yes. The training of the text prompt branch needs more data and a longer time to convergence, so we train the text prompt first.

hengseuer commented 1 month ago

Thank you for your response.

I have another question: When training text and visual prompts simultaneously, do the negative samples for the visual prompts come from the image itself, the current batch, or is there a maintained pool of negative samples?

Mountchicken commented 1 month ago

The negative samples for visual prompts are sampled from current mini batch

hengseuer commented 1 month ago

Thanks a lot.

Are all the samples in the current mini-batch from the same dataset?

If, during the current iteration, all the samples across the GPUs are from the same dataset and we sample negative examples from within the entire batch, similar to the approach used in DINOv, would this result in better performance?

Mountchicken commented 1 month ago

Our implementation only samples negative prompts from the current GPU. Using the sampling strategy in DINOv might bring more performance boosts.

hengseuer commented 1 month ago

Got it. Thanks.