Open Sabah98 opened 5 months ago
Hi @Sabah98 Sorry I was a bit busy, I just took a look at this problem. I wasn't yet able to figure out the root cause, since it looks like issue is within the model which returns bounding boxes that stretch outside the image. Anyway, I will fix this with simple clipping.
Description: We first run the 'eval_nn_detection' function to get the prediction boxes for the 'sample_audio.wav' file. After that, when we try to access the spectrogram frequency (spec.freqs) array to convert 'frequency end' (squeak.freq_end) indices to frequency (Hz) values, we get the 'IndexError' which occurs due to an out-of-bound index access. I added the code to reproduce the error below and provided the link to the audio file that was used to test the code.
Code to Reproduce:
Error Message:
Proposed Fix: To resolve this issue, the bounding box coordinates should be adjusted before they are used to index the frequency array. For images with N frequencies, values are between 0 and N in floats. These values need to be converted to integers 0 to N-1 since the endpoint of the interval need not be included. The 'combine_and_filter_predictions' function in 'neural_network.py' needs to be modified with the following line after line 209:
Sample Audio file link: https://drive.google.com/file/d/1H_Y2h6lK9GShwiwQyIxz_1ciW5Nwuh8z/view?usp=sharing