Open lxrrrrrr opened 4 months ago
I feel like I might have missed something somewhere, let me take a closer look
I can roughly achieve ~60% accuracy on msvd.
But i can only get ~42% on msrvtt.
I think part of the reason is the way the dataset is processed. Are you using the annotations provided by the author?
Yes, i use the annotations provided by the author. Maybe the problem is related to this.
Many thanks for your then I processed the data according to the code you provided and re-downloaded the msvd dataset using download_scripts in the code, but I can't use annotations provided by the author, there are a lot of data length mismatch will report an error, may I ask you how to deal with it, looking forward to your reply
You may simply reduce the total num_frames by 1 or 2 in the dataset.py for each dataset.
Many thanks for your then I processed the data according to the code you provided and re-downloaded the msvd dataset using download_scripts in the code, but I can't use annotations provided by the author, there are a lot of data length mismatch will report an error, may I ask you how to deal with it, looking forward to your reply
Following this https://github.com/boheumd/MA-LMM/issues/3#issuecomment-2053855973. You can update the "frame_length" to your actual extracted frame length for each video in the annotation file.
hello, I met the same problem, may I ask if you reproduce the value in the paper now?
I just used an A800 and changed the batch size to 32. The other parameters are consistent with the appendix of the paper. Why can I only achieve 53%