frostinassiky / gtad

The official implementation of G-TAD: Sub-Graph Localization for Temporal Action Detection
https://www.deepgcns.org/app/g-tad
Apache License 2.0
217 stars 44 forks source link

What does uNet_test.npy do? #37

Open mrlihellohorld opened 3 years ago

mrlihellohorld commented 3 years ago

Many thanks to the author for the excellent open source code. I have learned thatuNet_test.npy is the classification of each test video, but what is the specific function of uNet_test.npy in post-processing?

frostinassiky commented 3 years ago

Hello @mrlihellohorld , please check the post-processing code that uses this file:

https://github.com/frostinassiky/gtad/blob/6deb5b1bc6883b48bd22e0cc593069643c953e3d/gtad_postprocess.py#L106

The video classification scores help to assign class labels and reweight the G-TAD prediction.

mrlihellohorld commented 3 years ago

Hello @mrlihellohorld , please check the post-processing code that uses this file:

https://github.com/frostinassiky/gtad/blob/6deb5b1bc6883b48bd22e0cc593069643c953e3d/gtad_postprocess.py#L106

The video classification scores help to assign class labels and reweight the G-TAD prediction.

Thank you very much for your reply. I am puzzled by this sentence“The video classification scores help to assign class labels “. Because according to this code https://github.com/frostinassiky/gtad/blob/6deb5b1bc6883b48bd22e0cc593069643c953e3d/gtad_postprocess.py#L117 , I understand is to assign the tag of the whole video to the tag of the tmp_proposal,right?

frostinassiky commented 3 years ago

Sorry for the confusing sentence. Your understand is correct. 1, To decide the class label (or tag), we need to find the top-k classification scores. 2, From the indexes of the top-k scores, we can decide the video tag by thumos_class variable.

Please let me know if you have more questions. J

mrlihellohorld commented 3 years ago

Sorry for the confusing sentence. Your understand is correct. 1, To decide the class label (or tag), we need to find the top-k classification scores. 2, From the indexes of the top-k scores, we can decide the video tag by thumos_class variable.

Please let me know if you have more questions. J

Thank you very much for your patient answer. According to your answer, I have described my understanding and hope you can correct me,THX. a. ' From the indexes of the top-k scores, we can decide the video tag by thumos_class variable.' If I understand you correctly, here is the classification of tmp_proposal(snippet video),so the 'video' mean ‘snippet video’? b, In my own data set, each video has multiple actions. According to the data set, I first train a video classification model to get video level scores, and then classify each snippet according to the video scores. Do I understand that right?If correct, is it reasonable to classify snippet video by the whole video (including multiple actions)? I don't understand this part very well. I am looking forward to your reply. thanks!!!

frostinassiky commented 3 years ago

Hello @mrlihellohorld

The video means the untrimmed video that includes the snippet. If your video has multiple actions that belong to different classes, a reasonable setting is classifying those tmp_proposal via a separate branch. A new classification score ( e.g. this line ) can be added on G-TAD to achieve this.

mrlihellohorld commented 3 years ago

I don't quite understand you said. Could you please explain it more clearly? Thank you

frostinassiky commented 3 years ago

Sorry I didn't explain the questions clearly. If you are very interested in G-TAD, we can schedule a time and chat about it. My WeChat ID is FrostXuMengmeng.

ASMIftekhar commented 3 years ago

I am curious, did anyone actually implement action class prediction from another branch in gtad?