Closed Uason-Chen closed 6 months ago
Thank you for your interest in this work!
We found that different datasets behave differently in the CLIP joint embedding space and thus provide the customized settings to (1) further achieve the performance boost and (2) encourage the in-depth analysis and future explorations. There is a potential for the proposed T-MASS to achieve even better performance by trying more settings.
Please kindly consider star or fork this repo if you find the code helps. Much appreciated!
Thanks for quick response. My problem is solved. I will close this issue and star this repo.
No worries. Thank you!
Thank you to the author for sharing the open-source code. I noticed that the official training scripts have slightly different settings for different datasets. For example, the MSRVTT dataset uses
support_loss_weight
, but the other two datasets do not. For the LSMDC dataset, thestochastic prior
is set tonormal
, andstd
is set to 3e-3, but these settings are not applied to the other two datasets. For the DiDeMo dataset, there are no settings forsupport_loss_weight
,stochastic prior
, andstd
. I would like to know if it is indeed necessary to slightly modify the training parameters when training on different datasets, or if there are errors in the current training scripts?