lower evaluation metrics with pretrained model

JonghwanMun / LGI4temporalgrounding

Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"

130 stars 18 forks source link

lower evaluation metrics with pretrained model #6

Open peijunbao opened 4 years ago

peijunbao commented 4 years ago

Hi, I evaluate the provided pretrained model on ActivityNet. However, the result seems much inferior than your: [Test] -1 epoch 0 iter, R1-0.1 = 0.5197, R1-0.3 = 0.3310, R1-0.5 = 0.1842, R1-0.7 = 0.0779, mIoU = 0.2282.

My torch version is 1.1.0 and python version 3.7.6. Have you met similar problems?

Thank you very much.

JonghwanMun commented 4 years ago

I tested the code on python 3.7.4 and torch 1.5.0+cu101 with torchvision 0.6.0+cu101.

Can you tell me more details about how to run?

peijunbao commented 4 years ago

My lower results seem dues to some bugs in my generation of preprocess/grounding_info. I generate these data again and achieves similar results in paper.

Thank you.

peijunbao commented 4 years ago

The lower results seem to be some bugs when generating preprocess/grounding_info with training command.

More specifically, When I generate preprocess/grounding_info with testing command, i.e. python -m src.experiment.eval \ --config pretrained_models/anet_LGI/config.yml \ --checkpoint pretrained_models/anet_LGI/model.pkl \ --method tgn_lgi \ --dataset anet I can reproduce the results of pretrained evaluation.

When I generate preprocess/grounding_info with training command, i.e. python src/experiment/train.py --config_path src/experiment/options/anet/tgn_lgi/LGI.yml--method_type tgn_lgi --dataset anet --num_workers 4 I cannot reproduce the results of pretrained evaluation.

Is there some small bugs when generating preprocess/grounding_info? Or some error when I use training command to generate preprocess/grounding_info?

JonghwanMun commented 4 years ago

For me, generating preprocessed files obtained using the training script (bash scripts/train_model.sh LGI tgn_lgi anet 0 4 0) provides exactly same scores.

peijunbao commented 4 years ago

Thank you. And I would check it.

Are the config file provided by pretrained model and experiment\options same? i.e. pretrained_models\anet_LGI\config.yml experiment\options\anet\tgn_lgi\LGI.yml

They seems in different writing style. But do they give the same config?

JonghwanMun commented 4 years ago

pretrained_models\anet_LGI\config.yml is generated after loading experiment\options\anet\tgn_lgi\LGI.yml and updating it (options related to model). Thus, options for data are the same. I checked it.

peijunbao commented 4 years ago

The Distinct Query Attention loss (dqa loss) works well to regularize the query attention. But I have a few question about its implementation.

Assume that a query sentence has words length of K, where K is less than predefined max length(i.e. 25). Then its query attention from Kth to 25th is all zero. However, the implementation https://github.com/JonghwanMun/LGI4temporalgrounding/blob/8fb3ee1751eb98caf97821a5456161cc6dea6bbb/src/model/building_blocks.py#L705 regularize all attention (from 0th to 25th) to an identical matrix I. Thus these loss term keep always constant as 1 at the diagnal elements (of Kth to 25th) and can never be minimized.

Will this lead some irrationality?

JonghwanMun commented 4 years ago

As you recognized, in the dqa loss implementation, I had to perform zero-masking from kth to 25th attention weights. However, I think it will not lead significant irrationality because non-words from kth to 25th are not back-propagated via masking in the eariler steps of computing attention weights.