Pilhyeon / WTAL-Uncertainty-Modeling

Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)
MIT License
123 stars 10 forks source link

Some questions in your paper #13

Closed liming-ai closed 3 years ago

liming-ai commented 3 years ago

Hi @Pilhyeon

Thanks for your contribution, I tried again and could reproduce your result! It is really an amazing work!

I read your paper carefully, but there are still some details I cannot understand, could you please answer me if you have time?

  1. Could you please explain the figure 2 in your paper? What does the Y-axis Density mean?
  2. Can I understand the original features are obtained from embedded feature only use main pipeline as the whole model while separated features use both main pipeline and Uncertainty modeling as final model?
  3. Does the softmax score used in table 3 of ablation study means only use main pipeline in figure3 to obtain result?
  4. If i understand correctly, the softmax score is obtained by the original features, which means they are not separated, have unconstrained magnitudes, so is the description in the figure below wrong? It should be For the **first**, as the original...... image
  5. Could you please provide your extracted features and pretrained models for ActivityNet 1.2 and ActivityNet 1.3?

Thanks again for your contribution and patience, hope you can reply to me!

Pilhyeon commented 3 years ago

Hello, thanks for your interest! I hope the replies below would help.

  1. Density in Fig. 2 indicates the portion of samples, e.g., the value of 0.008 in density means 0.8 % of the samples are located there. It plays the exactly same role as normalization, which is necessary as the amounts of action and background frames are quite different.
  2. Yes, you're right.
  3. Yes, it is. To clarify more, the softmax score uses the first term (softmax score) in Eq. 3.
  4. In fact, the softmax score is unrelated to the feature magnitudes, as they are never used. On the other hand, suppose the case where fusion score is used without uncertainty modeling loss. As you mentioned, the magnitudes are not separated, so we need to perform min-max normalization rather than using m. Therefore, it should be "the second".
  5. For some reason, we put the ActivityNet features on hold. They may be released after the conference. We are sorry for the delay.

If you have further questions, feel free to let me know. Thanks!

liming-ai commented 3 years ago

Thanks for your reply!

liming-ai commented 3 years ago

Hello, thanks for your interest! I hope the replies below would help.

  1. Density in Fig. 2 indicates the portion of samples, e.g., the value of 0.008 in density means 0.8 % of the samples are located there. It plays the exactly same role as normalization, which is necessary as the amounts of action and background frames are quite different.
  2. Yes, you're right.
  3. Yes, it is. To clarify more, the softmax score uses the first term (softmax score) in Eq. 3.
  4. In fact, the softmax score is unrelated to the feature magnitudes, as they are never used. On the other hand, suppose the case where fusion score is used without uncertainty modeling loss. As you mentioned, the magnitudes are not separated, so we need to perform min-max normalization rather than using m. Therefore, it should be "the second".
  5. For some reason, we put the ActivityNet features on hold. They may be released after the conference. We are sorry for the delay.

If you have further questions, feel free to let me know. Thanks!

@Pilhyeon Could you please tell me that if there are also videos are excluded during training or testing in ActivityNet v1.2 or v1.3?

xumh-9 commented 3 years ago

Hello, thanks for your interest! I hope the replies below would help.

  1. Density in Fig. 2 indicates the portion of samples, e.g., the value of 0.008 in density means 0.8 % of the samples are located there. It plays the exactly same role as normalization, which is necessary as the amounts of action and background frames are quite different.
  2. Yes, you're right.
  3. Yes, it is. To clarify more, the softmax score uses the first term (softmax score) in Eq. 3.
  4. In fact, the softmax score is unrelated to the feature magnitudes, as they are never used. On the other hand, suppose the case where fusion score is used without uncertainty modeling loss. As you mentioned, the magnitudes are not separated, so we need to perform min-max normalization rather than using m. Therefore, it should be "the second".
  5. For some reason, we put the ActivityNet features on hold. They may be released after the conference. We are sorry for the delay.

If you have further questions, feel free to let me know. Thanks!

@Pilhyeon Could you please tell me that if there are also videos are excluded during training or testing in ActivityNet v1.2 or v1.3?

Can you reproduce the result by training the model in your environment by yourself not using the pre-trained model ?

Pilhyeon commented 3 years ago

@mitming In fact, some of the ActivityNet videos are unavailable at this time, so the entries of training/validation videos used for experiments are slightly different depending on the papers. In our case, 9,272 training videos and 4,541 validation videos were available.

liming-ai commented 3 years ago

Hello, thanks for your interest! I hope the replies below would help.

  1. Density in Fig. 2 indicates the portion of samples, e.g., the value of 0.008 in density means 0.8 % of the samples are located there. It plays the exactly same role as normalization, which is necessary as the amounts of action and background frames are quite different.
  2. Yes, you're right.
  3. Yes, it is. To clarify more, the softmax score uses the first term (softmax score) in Eq. 3.
  4. In fact, the softmax score is unrelated to the feature magnitudes, as they are never used. On the other hand, suppose the case where fusion score is used without uncertainty modeling loss. As you mentioned, the magnitudes are not separated, so we need to perform min-max normalization rather than using m. Therefore, it should be "the second".
  5. For some reason, we put the ActivityNet features on hold. They may be released after the conference. We are sorry for the delay.

If you have further questions, feel free to let me know. Thanks!

@Pilhyeon Could you please tell me that if there are also videos are excluded during training or testing in ActivityNet v1.2 or v1.3?

Can you reproduce the result by training the model in your environment by yourself not using the pre-trained model ?

Sorry, I have tried many times, but I still cannot reproduce the result in paper without pre-trained model, could you reproduce it?