JacobChalk / TIM

Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
37 stars 5 forks source link

Regarding the Two Annotation Sources #33

Closed lin-nie closed 2 months ago

lin-nie commented 2 months ago

Hello, thank you very much for providing such an outstanding and impressive model. I am currently attempting to reproduce your results.

would like to understand why there are two different sources for obtaining the annotations: 1) one provided through the official channel: https://github.com/epic-kitchens/epic-kitchens-100-annotations, and 2) the other provided in your TIM GitHub project: https://www.dropbox.com/scl/fi/xs6muwf67a5h9ql30jart/annotations.zip?rlkey=iw6b4w9n4brcpvygoksmrvf4n&e=1&st=j6c1exut&dl=0.

Could you please explain the differences between the annotations obtained from these two sources?

And also, when I extracted features using videoMAE and Omnivore, I used the first source, which is the official annotation. I discovered later that you provided a second set of annotations. Could you please advise on how this might affect the extracted features? Should I re-extract the features using the second set of annotations?

Really thank you for your help, wait for your message!

Nie

JaesungHuh commented 2 months ago

Hi lin-nie,

The annotations in the official channel are "action recognition" temporal segments, the segments that we want to recognise the human action. The annotations in the Dropbox (*_feature_times.pkl) are start / end timestamps with the sliding window(stride 0.2 sec)

For running TIM, you need to extract the features using the *_feature_times.pkl files, since TIM needs dense features of untrimmed videos. The annotations in the official channel are used for training TIM.

Jaesung

lin-nie commented 2 months ago

Hi, thank you very much for your prompt reply. Let me summarize your answer:

  1. when extracting features, such as when using VideoMAE or Omnivore that you provide, we need to use _feature_times.pkl”. For example, the following two cases.

For TIM-Omnivore backbone: image

For TIM-VideoMAE backbone: image

  1. when training TIM, we need to use the annotation file provided by official channel: https://github.com/epic-kitchens/epic-kitchens-100-annotations, which corresponds to the instruction below:

For TIM-recognition: image

For TIM-detection: image

Do I understand correctly? Jaesung, thank you very much!

Nie 2024.08.22

JaesungHuh commented 2 months ago

Yes.

In the second point, _context_pickle --> you need to put the path of _feature_times.pkl files.

lin-nie commented 2 months ago

So nice for your prompt reply, I got your point.

Thank you very much!

Jaesung, hope you have a nice day.

Nie 2024.08.22