UttaranB127 / STEP

Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
https://gamma.umd.edu/step
MIT License
69 stars 12 forks source link

Question regarding the data for training and testing #19

Closed vigneshrk29 closed 9 months ago

vigneshrk29 commented 1 year ago

Hello,

Thank you for your incredible work. I have a question regarding the data provided for training (code).

1) From the paper I understand that for emotion classification, the gait + affective features are used for classifying? But from the code it seems that just the affective features are split into data_training and data_testing or am I mistaken?

Thanks

vigneshrk29 commented 1 year ago

Hi,

Sorry I have another question to the one above, I tried running the code on videos provided in the following dataset: https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h

I extracted the skeleton using Posenet but most of the emotions being classified are 0 which is angry. Do you know why this is happening and how can I fix it?

UttaranB127 commented 1 year ago

Hello, sorry for not following up on your questions sooner. In the dataset, we have "features" files that contain the gait data and "affectiveFeatures" that contain the affective features.

The STEP method has some issues with overfitting, because of which we moved on to other approaches. You can check out some of our more recent work using pose-based affective encoders (https://github.com/UttaranB127/speech2affective_gestures). While that work is used for pose generation, you can use the affective encoder to obtain latent vectors and train those for classification.

vigneshrk29 commented 1 year ago

Thank you. Just to clarify: 1) The affective features are not calculated during run-time from the gait data but are calculated separately? 2) Regarding the videos to emotion classification: do you mean the videos in the dataset (https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h) do not correspond well with emotions used in STEP? I mostly just get that the gait shows emotion.

My goal is to classify emotion from gait. Additionally, in the dataset https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h:

1) The labels for Human3.6m and CMU videos do not always correspond with the videos. Especially for human 3.6m. Which video is S1_2?

UttaranB127 commented 1 year ago
  1. Yes. We have a separate code to compute affective features (inside compute_aff_features) folder.
  2. The emotions were acted in the dataset, which leads to some overfitting during training. But they are still usable.
  3. Unfortunately, I don't have the mapping to videos (folks who collected the original data graduated ~5 years ago). For this code, we only used the features and affectiveFeatures files.
vigneshrk29 commented 1 year ago
  1. Is this different from the common.get_affective_features function called in the loader class?
  2. Okay, I will try to look at what is going wrong with my code

One more question: did you use just the detect and track for pose estimation? And would one-hot encoding improve the results?

UttaranB127 commented 1 year ago

We used the 3D pose tracker from this paper: https://www.cse.iitb.ac.in/~rdabral/docs/multi_person_3dv.pdf, but their code doesn't seem to be available in GitHub anymore. You can look at recent 3D pose trackers.

vigneshrk29 commented 1 year ago

Hi,

Sorry but are the poses (skeleton) landmarks normalised prior to being inferred or trained on? Or would a skeleton in any coordinate system work (like any magntiude of joint locations)?

Thanks

UttaranB127 commented 1 year ago

We have normalized the input before passing it to the network. As long as the inputs are normalized, the network should work for any coordinate system.