cure-lab / SmoothNet

[ECCV 2022] Official implementation of the paper "SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos"
Apache License 2.0
360 stars 38 forks source link

About the training process #4

Closed jiejiangd closed 2 years ago

jiejiangd commented 2 years ago

Loading dataset (0)...... ############################################################# You are loading the [training set] of dataset [h36m] You are using pose esimator [fcn] The type of the data is [3D] The frame number is [1559752] The sequence number is [600] ############################################################# ############################################################# You are loading the [testing set] of dataset [h36m] You are using pose esimator [fcn] The type of the data is [3D] The frame number is [543344] The sequence number is [236] #############################################################

The training process has been stuck in the data loading interface, would like to ask what is probably the problem

juxuan27 commented 2 years ago

Hi, @jiejiangd ! Thank you for your focus! Personally, I recommend you wait for a while and see whether the training processing will start later. After loading datasets, there will be model loading and data processing before starting the training process. This takes about 12s using V100, different machines may need different time. Moreover, you can try breakpoint debugging or add print function in dataloader(or dataset class definition e.g. human3.6m dataset definition) to see if there's something goes wrong.

jiejiangd commented 2 years ago

Training started, but it took a while to process the data. Also, for H36m dataset, does the first dimension 600 represent 600 scenes, the second dimension 1149 (variable) is the window, and the third dimension 51 is the number of feature points? Since it is my first time to contact this direction, I am very interested in this method and want to apply it to radar target track filtering, please

juxuan27 commented 2 years ago

For your question, the first dimension means there are totally 600 sequences in this dataset. The second dimension means the specific sequence has 1149 frames. The third dimension is keypoint_number×dimension. For example, if you use 17 keypoints to represent a human in the 3D dimension, then the third dimension is 17×3=51. It is the same for 2D poses, if you use 17 keypoints then the last dimension will be 17×2=34.

jiejiangd commented 2 years ago

Let me ask you again as a layman, the training of the model doesn't require image data, just points, right? And the training of the model is already combined with the current backbones of human pose estimation.

ailingzengzzz commented 2 years ago

Hi @jiejiangd, Yes, we have processed the detected pose sequence from images through different pose estimators. You can use these data directly to develop better models. The steps are:

  1. We first get the detected pose sequences on different datasets through different pose estimators. (we have provided all processed data)
  2. We use these data to train SmoothNet directly without training existing backbones.
  3. Once SmoothNet is trained, it can be used on different data with the shape [T, C], where T should be the same, and C can be any size.
jiejiangd commented 2 years ago

I see. Thank you very much