HYPJUDY / Decouple-SSAD

Decoupling Localization and Classification in Single Shot Temporal Action Detection
https://arxiv.org/abs/1904.07442
MIT License
96 stars 19 forks source link

Question about extract_feature.py #1

Closed Sy-Zhang closed 5 years ago

Sy-Zhang commented 5 years ago

Hi yupan, Thanks for sharing your codes. However, when I load files from spatialDadaXKnetV3, all of them have the same dimention (128, 2048).

When I check your extract_feature.py, it seems like the previous saved feature are overlapped in the next iteration. Thus, the npy file only save the last sliding window of the corresponding video. Could you fix this bug?

HYPJUDY commented 5 years ago

Hi Songyang, Yes, the dimension of each each sliding window in npy file is (128, D) because: 128=512/4 https://github.com/HYPJUDY/Decouple-SSAD/blob/514116dca4746e334defddda68c555c098775c0a/data/extract_feature.py#L64-L65 https://github.com/HYPJUDY/Decouple-SSAD/blob/514116dca4746e334defddda68c555c098775c0a/data/extract_feature.py#L81-L82

And could you point out by which lines of code previous saved feature are overlapped in the next iteration? The extraction of spatial or temporal feature is determined by each line of window_infoFile. For example, if the line is 640, video_validation_0000417, then video_spatial_prediction will extract the feature from 640th frame (to 640+ window_size) of video_validation_0000417. Thus, each npy file is corresponding to each line of the window_infoFile but not each video (usually one video will have several sliding windows and several npy files). Please refer to Preprocess Data by Yourself for full process.

BTW, the feature extraction of motion feature did have some overlap in my default settings because the number of consecutive optical frames to construct a clip is 5 but the step is 4, so there will be overlapped 1 frame. https://github.com/HYPJUDY/Decouple-SSAD/blob/514116dca4746e334defddda68c555c098775c0a/data/extract_feature.py#L121-L123 https://github.com/HYPJUDY/Decouple-SSAD/blob/514116dca4746e334defddda68c555c098775c0a/data/extract_feature.py#L139-L141

Sy-Zhang commented 5 years ago

Got it. I mistakenly thought the npy's id is the video id, but it is actually the window's id. Thanks for your reply.

xinyeCH commented 5 years ago

Hi yupan, thanks for your contribution! I have a question about the variable ' special_idx' in extract_feature.py. What does this variable mean?

HYPJUDY commented 5 years ago

Hi @yinianjimo , I explained in the comments: https://github.com/HYPJUDY/Decouple-SSAD/blob/514116dca4746e334defddda68c555c098775c0a/data/extract_feature.py#L20-L25 When the video length is smaller than the window size, the last frame will be used for padding. Since only two videos have this situation, it should have little influence on the overall performance.

xinyeCH commented 5 years ago

Sorry I didn't notice the comment. Thanks for yours explanation.