26hzhang / VSLNet

Span-based Localizing Network for Natural Language Video Localization (ACL 2020)
MIT License
100 stars 17 forks source link

Reproduce on TACoS #21

Closed minjoong507 closed 8 months ago

minjoong507 commented 1 year ago

Hi there. Thanks for sharing your great work!

I'm trying to reproduce the results in Table 2. (on TACoS dataset)

However, my results and the report in Table 2 are totally different.

Can you give me a detail to reproduce the results in Table 2?

image

EricLina commented 9 months ago

hello, have you solved the problem? Did you use the pytorch implementation?

EricLina commented 9 months ago

Rank@1, IoU=0.3: 39.39 Rank@1, IoU=0.5: 28.12 Rank@1, IoU=0.7: 15.90 mean IoU : 28.23

minjoong507 commented 8 months ago

Hi @EricLina,

If you ask for differences in the model's performances, the possible reason for this might be using different visual features.

Quoting from [2], "Note 2D-TAN [1] pre-processes the TACoS dataset, making it slightly different from the original one."

As shown in [2], there is a clear difference in performance when using the pre-processed visual features from [1].

I achieved higher performance on TACoS than what was reported in the original paper because this repository utilized the visual features from [1].

[1] Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language, AAAI 2020

[2] Parallel Attention Network with Sequence Matching for Video Grounding, ACL 2021