Open Ehsan-Nirjhar opened 2 years ago
Hi Ehsan-Nirjhar,
Many thanks for pointing out this issue. I will update the baseline code. Possibly it might change the score as well.
most of the features array is zero and the Mel_train.h5 created is just 65 mb which seems very small .please solve this issue to reproduce the baseline results
Hi @shubhr and @c4dm, features extracted from 2022 data (saved in
Mel_train.h5
) using the exact code have all zero features for ~80% audios. I investigated the issue and found that there are 4 audios (and annotations) inWMW
data folder that have the first POS label with Starttime 0. As the current code uses a margin of 25ms around the onset and offsets (time_2_frame(df,fps)' function in 'Feature_extract.py
), Starttime for the first POS labels for these audios become negative. When the timestamps are converted to frames, they are also negative andpcen_patch@Feature_extract.py:line 62
becomes an empty array. This makes all the previous entries of thehf['features'][file_index]@Feature_extract.py:line 65
to have zeros, instead of the actual values. The 4 audio files are-XC406576.wav, XC417425.wav, XC440361.wav, XC483906.wav
. This issue is not present with 2021 training data, as there are no audios (and annotations) with such case. I am not sure if the baseline training was done using these features with zero values, which might create embedding/prototypes not representative of the actual classes and data.If I add the following checking mechanism in
Feature_extract.py:line 55
, the issue can be avoided by making all negative frame indices 0. It can be done for the evaluation features as well to avoid same issue.