cosmaadrian / multimodal-depression-from-video

Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Other
37 stars 4 forks source link

Why does the value of NAN appear when dealing with emonet embedding? #88

Open waHAHJIAHAO opened 1 month ago

waHAHJIAHAO commented 1 month ago

I use private dataset which is collected from mental colleage in my universiy,.I processed my original video data in the way that D-vlog processed the data. I finished the data processing of landmarks and stored the corresponding npz file in the faces directory. When I ran the emonet script, there was a value of NAN in the processed result. May I ask why

屏幕截图 2024-07-26 173912
david-gimeno commented 1 month ago

I am not sure if we experienced the same problem. I think this is something you should ask to the original authors of the face landmark detector. Perhaps, they correspond to the frames were no face was found, but I don't think so, because we were handling these situation by zero-filling the feature sequence. If these NaN frames are no so common, I would recommend you replace them by zeros, using this code:

face_landmarks[np.isnan(face_landmarks)] = 0.

you can use do it in advance and saving again the files, or you can dynamically apply this logic when defining your Dataset, the object in charge of loading the data for training and evaluation.