Closed vigneshrk29 closed 9 months ago
Hi,
Sorry I have another question to the one above, I tried running the code on videos provided in the following dataset: https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h
I extracted the skeleton using Posenet but most of the emotions being classified are 0 which is angry. Do you know why this is happening and how can I fix it?
Hello, sorry for not following up on your questions sooner. In the dataset, we have "features" files that contain the gait data and "affectiveFeatures" that contain the affective features.
The STEP method has some issues with overfitting, because of which we moved on to other approaches. You can check out some of our more recent work using pose-based affective encoders (https://github.com/UttaranB127/speech2affective_gestures). While that work is used for pose generation, you can use the affective encoder to obtain latent vectors and train those for classification.
Thank you. Just to clarify: 1) The affective features are not calculated during run-time from the gait data but are calculated separately? 2) Regarding the videos to emotion classification: do you mean the videos in the dataset (https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h) do not correspond well with emotions used in STEP? I mostly just get that the gait shows emotion.
My goal is to classify emotion from gait. Additionally, in the dataset https://drive.google.com/drive/folders/1wWL0Yc7Oa7AMm2QqQ4lbtTIRYvMW0L2h:
1) The labels for Human3.6m and CMU videos do not always correspond with the videos. Especially for human 3.6m. Which video is S1_2?
One more question: did you use just the detect and track for pose estimation? And would one-hot encoding improve the results?
We used the 3D pose tracker from this paper: https://www.cse.iitb.ac.in/~rdabral/docs/multi_person_3dv.pdf, but their code doesn't seem to be available in GitHub anymore. You can look at recent 3D pose trackers.
Hi,
Sorry but are the poses (skeleton) landmarks normalised prior to being inferred or trained on? Or would a skeleton in any coordinate system work (like any magntiude of joint locations)?
Thanks
We have normalized the input before passing it to the network. As long as the inputs are normalized, the network should work for any coordinate system.
Hello,
Thank you for your incredible work. I have a question regarding the data provided for training (code).
1) From the paper I understand that for emotion classification, the gait + affective features are used for classifying? But from the code it seems that just the affective features are split into data_training and data_testing or am I mistaken?
Thanks