Hi, I am evaluating the model of hybrid_space on VATEX dataset. The reproduced text-to-video results are a bit lower than reported in the paper, which are inferior to HGR model.
I tried to retrain the model but cannot reproduce the results superior to HGR.
Could you give some tips about it? Many thanks.
Hi, I am evaluating the model of hybrid_space on VATEX dataset. The reproduced text-to-video results are a bit lower than reported in the paper, which are inferior to HGR model.
I tried to retrain the model but cannot reproduce the results superior to HGR. Could you give some tips about it? Many thanks.