JosephTheMoUSE / MoUSE

Toolkit for processing, localisation and classification of rodent ultrasonic squeaks.
MIT License
3 stars 1 forks source link

IndexError in accessing indices in spectrogram frequency from detected prediction boxes #24

Open Sabah98 opened 5 months ago

Sabah98 commented 5 months ago

Description: We first run the 'eval_nn_detection' function to get the prediction boxes for the 'sample_audio.wav' file. After that, when we try to access the spectrogram frequency (spec.freqs) array to convert 'frequency end' (squeak.freq_end) indices to frequency (Hz) values, we get the 'IndexError' which occurs due to an out-of-bound index access. I added the code to reproduce the error below and provided the link to the audio file that was used to test the code.

Code to Reproduce:

from pathlib import Path
import torchaudio
from mouse.nn_detection.neural_network import find_USVs as find_USVs_nn
from mouse.utils.sound_util import spectrogram as generate_spectrogram
from pathlib import Path

def eval_nn_detection(spec, model_name, thresholds=(-1, 0.1)):
    return {
        f'{model_name}({th})':
        find_USVs_nn(spec_data=spec,
                     model_name=model_name,
                     cache_dir=Path('.'),
                     batch_size=1,
                     confidence_threshold=th,
                     tqdm_kwargs=dict(position=1,
                                      leave=False,
                                      desc=f'{model_name}({th})')) for th in thresholds
    }

nn_model = 'f-rcnn-custom'
default_threshold = [0.0]
file = "sample_audio.wav"
waveform, sample_rate = torchaudio.load(file)
spec = generate_spectrogram(waveform, sample_rate=sample_rate)

nn_model_preds_0 = eval_nn_detection(spec=spec,
                                    model_name=nn_model,
                                    thresholds=default_threshold)

for model_str, pred_calls in nn_model_preds_0.items():
    for no_squeak, squeak in enumerate(pred_calls):
        freq_end=spec.freqs[squeak.freq_end]

Error Message:

IndexError: index 257 is out of bounds for axis 0 with size 257

Proposed Fix: To resolve this issue, the bounding box coordinates should be adjusted before they are used to index the frequency array. For images with N frequencies, values are between 0 and N in floats. These values need to be converted to integers 0 to N-1 since the endpoint of the interval need not be included. The 'combine_and_filter_predictions' function in 'neural_network.py' needs to be modified with the following line after line 209:

            boxes[:, [1, 3]] = np.digitize(boxes[:,[1, 3]], 
                                           np.arange(len(spec_data.freqs)), 
                                           False) - 1

Sample Audio file link: https://drive.google.com/file/d/1H_Y2h6lK9GShwiwQyIxz_1ciW5Nwuh8z/view?usp=sharing

Zhylkaaa commented 4 months ago

Hi @Sabah98 Sorry I was a bit busy, I just took a look at this problem. I wasn't yet able to figure out the root cause, since it looks like issue is within the model which returns bounding boxes that stretch outside the image. Anyway, I will fix this with simple clipping.