Closed h-bo closed 6 years ago
And the way you put the last k entries in 50, if the text is 'I love you', should the put mode as [0, 0, ..., you, love, I] or [0, 0, ..., I, love, you]? thanks
And, in the readme, should the latter be cont_data[:-t,n ] = 0 ? Thanks
And how you preprocess the text. You know, there are words like "someone's, baby's", they can not be encoded but meaningful in fact. Thanks!
@h-bo you run the code that the author provided or you realize the code by yourself?
@mengliu1991 the author provided
@h-bo can you tell me how you set the input data for the code that the author provided? I am not sure about the loc_data, it is the location of the ground truth moment or the time of the currently input video clip? And the loss function in the author's code is different from the one in her paper? Do you change it ?
Thank you.
I do not get the result she provides, you must know. @mengliu1991
Thanks for your good work. I am trying yo run your model but get a much lower result(0.10@1, 0.4@5, 0.25iou), I guess it may result from the glove version you use(I am using glove.6b), could you tell me the version you use?
By the way, the stacked mode I use is 'overall-video_mean + local-video_mean + segment'(by experiment, it outperforms the other stacked way, but I still want to make sure the way you use...) Thank you