LisaAnne / LocalizingMoments

Github for my ICCV 2017 paper: "Localizing Moments in Video with Natural Language"
190 stars 44 forks source link

Can not reproduce your result by the model you provide #5

Closed h-bo closed 6 years ago

h-bo commented 6 years ago

Thanks for your good work. I am trying yo run your model but get a much lower result(0.10@1, 0.4@5, 0.25iou), I guess it may result from the glove version you use(I am using glove.6b), could you tell me the version you use?

By the way, the stacked mode I use is 'overall-video_mean + local-video_mean + segment'(by experiment, it outperforms the other stacked way, but I still want to make sure the way you use...) Thank you

h-bo commented 6 years ago

And the way you put the last k entries in 50, if the text is 'I love you', should the put mode as [0, 0, ..., you, love, I] or [0, 0, ..., I, love, you]? thanks

h-bo commented 6 years ago

And, in the readme, should the latter be cont_data[:-t,n ] = 0 ? image Thanks

h-bo commented 6 years ago

And how you preprocess the text. You know, there are words like "someone's, baby's", they can not be encoded but meaningful in fact. Thanks!

mengliu1991 commented 6 years ago

@h-bo you run the code that the author provided or you realize the code by yourself?

h-bo commented 6 years ago

@mengliu1991 the author provided

mengliu1991 commented 6 years ago

@h-bo can you tell me how you set the input data for the code that the author provided? I am not sure about the loc_data, it is the location of the ground truth moment or the time of the currently input video clip? And the loss function in the author's code is different from the one in her paper? Do you change it ?

Thank you.

h-bo commented 6 years ago

I do not get the result she provides, you must know. @mengliu1991