fpv-iplab / rulstm

Code for the Paper: Antonino Furnari and Giovanni Maria Farinella. What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019.
http://iplab.dmi.unict.it/rulstm
132 stars 33 forks source link

annotation issue #27

Closed Fanye12 closed 1 year ago

Fanye12 commented 1 year ago

Thanks for open sourcing such an awesome project, but I have some question. The ek/100/training.csv you provided is different from the one provided by official website of epic-kitchen. Did you relabel it yourself?

1679539987592 1679540077047

These two pictures are partial screenshots of the csv file provided by you two. Your action labels are the same, but start_frame and stop_frame are different. May I ask why this happened and which one should I use when doing this anticipation task? looking forward to your clarification.

antoninofurnari commented 1 year ago

Hello,

We re-encoded all videos to have fixed 30fps (the original videos in EK-55 are 60fps, while in EK-100 are mixed 50fps and 60fps). Due to the re-encoding the frame numbers are not the same. We also removed some annotations form training and validation which where too close to the beginning of the video (hence we cannot observe the past in those cases).

For reference, row 7 of our file (bottom picture) corresponds to row 8 of the original file (top row). Indeed, if you multiply our start frame (14556) by 2 (going from 30fps to 60fps), you obtain 29112, which corresponds to 29113 in row 8 of the original dataset. The one frame difference is probably due to rounding in the calculation of the frame number form the timestamp.

Hope this clarifies the differences between the two files.

Antonino

Fanye12 commented 1 year ago

Thank you for your answer, it is very important to me, I understand the above question, but I still have a question。 As stated in your paper, you use mean top5 recall as an important metric, But I noticed in your code that you simply take an average instead of a weighted average in all classes, but the number of all classes is obviously different or even very different. Is it reasonable to take the average directly?It seems that each class has the same impact on the result, regardless of their number.

1679578741856
antoninofurnari commented 1 year ago

That's correct. We take a simple average, not a weighted one. This is how the evaluation measure has been defined (see section 3.2 of https://www.antoninofurnari.it/publications/furnari2018Leveraging.pdf). This follows other evaluation measures such as mAP in object detection, which take a simple average even if the test set may be unbalanced.

You may take a weighted average, but this would be very close to simply computing an accuracy, which may reward models for working just for the few top classes. This was the main rationale for choosing the simple average in the first place.

Fanye12 commented 1 year ago

Thank you very much, your answer really cleared up a lot of my doubts. If I can publish a paper in this field in the future, I must thank you in the paper. Haha, if I have more questions in the future, I hope to get your answer. Thanks again.

antoninofurnari commented 1 year ago

Sure :)