EGO4D / hands-and-objects

MIT License
77 stars 9 forks source link

data preprocessing #4

Closed idejie closed 2 years ago

idejie commented 2 years ago

Hi, there are many configurations, maybe under the preprocessed data, like positive_clips, pre_pnr_post_frames? How can I process the data to get these splits? Is there a more detailed document?

Sid2697 commented 2 years ago

Hello @idejie, thank you for reaching out!

positive_clips refers to the clips that contain a state change. Similarly, negative_clips means the clips that do not have a state change. Further, @fuqichen1998 will be able to give you more idea about pre_pnr_post_frames.

I would suggest you to please have a look at this (https://github.com/EGO4D/hands-and-objects/blob/main/state-change-localization-classification/i3d-resnet50/configs/defaults.py) file for a detailed description of various configuration terms.

Feel free to reach out to us if there is any doubt!

Regards, Siddhant Bansal

idejie commented 2 years ago

Thanks a lot for your reply, @Sid2697 ! The repo is quite awesome! @fuqichen1998 !

I will carefully study the codes, and thanks for your advice!

And I still have some questions:

  1. In the training stage, does all the data (one clip split) contain only one change or PNR frame in the training stage?
  2. Is any of the clips trimmed for one object state change? (any clip does not contain extraneous frames or segments in it)
  3. And how about the test stage? Does the test data also contain only one change within one clip? And the length of one clip is also 8s? And are the clips also trimmed for object state changes?
  4. For the PNR localization evaluation, I noticed the ground-truth of the PNR is a frame (start of a change in one clip). So in the challenge, is the PNR also only a frame? Or is it a short duration?
  5. Do all clips provide their audio?

Thanks again!

Sid2697 commented 2 years ago

Thank you @idejie!

Here are answers to your questions:

  1. Each 8 seconds clip contains one state change in it.
  2. Sorry, I did not get the question. Can you please elaborate? Thanks!
  3. Yes! The test data also contains one state change per 8-second clip. Yes, the length is 8 seconds. (I might be able to answer the last part once I understand question 2).
  4. Yes, the PNR is also only a frame; it is not a short duration. @hyf015 will be able to provide you with more details on the corresponding challenge.
  5. Audio is available for a subset of the data. @ebyrne will be able to guide you on what part contains the audio.

Regards, Siddhant Bansal

idejie commented 2 years ago

Thanks for your answers @Sid2697 ! And sorry for the unclear expression for Q2.

For these 8s clips, I know negative samples don't contain any state change frame. The PNR localization is to find the beginning frame of an object state change in positive samples, and the state change may be related to an action. It's like the task of temporal action localization, and one baseline in this repo is based on a temporal action localization method (boundary matching network).

For the temporal action localization task, a clip may contain several actions, and some methods can find their boundaries(start and end of an action). But for the PNR localization, does a positive sample contain other actions whose beginnings are not PNR?

Best Wishes, Dejie Yang

Sid2697 commented 2 years ago

Thank you @idejie for detailed description of the question!

For the PNR localisation task, a positive sample contains only one action. The data selection and annotation have been carried out to have one action (and the corresponding PNR frame) per clip.

Let me know if there is any confusion!

Regards, Siddhant Bansal

idejie commented 2 years ago

Thanks for your answer @Sid2697 !

I think you have solved my questions!

Best Wishes, Dejie Yang