c4dm / dcase-few-shot-bioacoustic

MIT License
48 stars 36 forks source link

Feature array becoming all zero for 80% audios with current code and 2022 training data #25

Open Ehsan-Nirjhar opened 2 years ago

Ehsan-Nirjhar commented 2 years ago

Hi @shubhr and @c4dm, features extracted from 2022 data (saved in Mel_train.h5) using the exact code have all zero features for ~80% audios. I investigated the issue and found that there are 4 audios (and annotations) in WMW data folder that have the first POS label with Starttime 0. As the current code uses a margin of 25ms around the onset and offsets (time_2_frame(df,fps)' function in 'Feature_extract.py), Starttime for the first POS labels for these audios become negative. When the timestamps are converted to frames, they are also negative and pcen_patch@Feature_extract.py:line 62 becomes an empty array. This makes all the previous entries of the hf['features'][file_index]@Feature_extract.py:line 65 to have zeros, instead of the actual values. The 4 audio files are- XC406576.wav, XC417425.wav, XC440361.wav, XC483906.wav. This issue is not present with 2021 training data, as there are no audios (and annotations) with such case. I am not sure if the baseline training was done using these features with zero values, which might create embedding/prototypes not representative of the actual classes and data.

If I add the following checking mechanism in Feature_extract.py:line 55, the issue can be avoided by making all negative frame indices 0. It can be done for the evaluation features as well to avoid same issue.

if str_ind < 0:      
    str_ind = 0

if end_ind < 0:
    end_ind = 0
shubhrsingh22 commented 2 years ago

Hi Ehsan-Nirjhar,

Many thanks for pointing out this issue. I will update the baseline code. Possibly it might change the score as well.

Noumanijaz744 commented 1 year ago

most of the features array is zero and the Mel_train.h5 created is just 65 mb which seems very small .please solve this issue to reproduce the baseline results