Hello, Thank you for your nice work about "TransVOD"!
I have a question here: "single train" only trains the first half of the network, after learning the output head after STD, the fixed weight begins to train the full network, so why not train the output head and the temporal network together? because of Slow convergence?
Waiting for your reply!
Hello, Thank you for your nice work about "TransVOD"!