Inference speed on a small inputs

RoyChao19477 / SEMamba

This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)

Other

143 stars 14 forks source link

Hello. Thanks for the code! I tested the model on audio data of various lengths and noticed that the inference for short audio is slower than for long audio. In my case, the intended data ranges from 0.5 to 1 second, but the model processes it quite slowly. Adding zero padding up to 30 seconds speeds up the process, but the inference time still doesn't go below 2 seconds.

Could you explain why this happens and if there is anything I can do to speed up the inference on short audio?

Below are the timings for different segments taken from the beginning of the same audio file:

INFO:__main__:1 - 0.032488 in seconds.
INFO:__main__:2 - 0.016094 in seconds.
INFO:__main__:3 - 5.410543 in seconds.
INFO:__main__:4 - 0.015493 in seconds.

INFO:__main__:1 - 0.033495 in seconds.
INFO:__main__:2 - 0.014399 in seconds.
INFO:__main__:3 - 4.745659 in seconds.
INFO:__main__:4 - 0.015998 in seconds.

10s

INFO:__main__:1 - 0.034424 in seconds.
INFO:__main__:2 - 0.014369 in seconds.
INFO:__main__:3 - 3.391307 in seconds.
INFO:__main__:4 - 0.017608 in seconds.

20s

INFO:__main__:1 - 0.033065 in seconds.
INFO:__main__:2 - 0.015286 in seconds.
INFO:__main__:3 - 2.050973 in seconds.
INFO:__main__:4 - 0.015754 in seconds.

30s

INFO:__main__:1 - 0.033277 in seconds.
INFO:__main__:2 - 0.014498 in seconds.
INFO:__main__:3 - 2.001130 in seconds.
INFO:__main__:4 - 0.016316 in seconds.

    with torch.no_grad():
        noisy_wav, _ = librosa.load('1.wav', sr=sampling_rate)
        noisy_wav = torch.FloatTensor(noisy_wav).to(device)
        noisy_wav = pad_audio(noisy_wav, 30, sampling_rate)

        before_time = time.perf_counter()
        norm_factor = torch.sqrt(len(noisy_wav) / torch.sum(noisy_wav ** 2.0)).to(device)
        torch.cuda.synchronize(device)
        logger.info('1 - %f in seconds.', time.perf_counter() - before_time)

        before_time = time.perf_counter()
        noisy_wav = (noisy_wav * norm_factor).unsqueeze(0)
        noisy_amp, noisy_pha, noisy_com = mag_phase_stft(noisy_wav, n_fft, hop_size, win_size, compress_factor)
        torch.cuda.synchronize(device)
        logger.info('2 - %f in seconds.', time.perf_counter() - before_time)

        before_time = time.perf_counter()
        amp_g, pha_g, com_g = model(noisy_amp, noisy_pha)
        torch.cuda.synchronize(device)
        logger.info('3 - %f in seconds.', time.perf_counter() - before_time)

        before_time = time.perf_counter()
        audio_g = mag_phase_istft(amp_g, pha_g, n_fft, hop_size, win_size, compress_factor)
        audio_g = audio_g / norm_factor
        torch.cuda.synchronize(device)
        logger.info('4 - %f in seconds.', time.perf_counter() - before_time)

with torch.no_grad(): noisy_wav, _ = librosa.load(os.path.join( args.input_folder, fname ), sr=sampling_rate) noisy_wav = noisy_wav[:16000] # 1 second samples noisy_wav = np.tile(noisy_wav, 100) # Repeat N times noisy_wav = torch.FloatTensor(noisy_wav).to(device) before_time = time.perf_counter() norm_factor = torch.sqrt(len(noisy_wav) / torch.sum(noisy_wav ** 2.0)).to(device) torch.cuda.synchronize(device) logger.info('1 - %f in seconds.', time.perf_counter() - before_time) before_time = time.perf_counter() noisy_wav = (noisy_wav * norm_factor).unsqueeze(0) noisy_amp, noisy_pha, noisy_com = mag_phase_stft(noisy_wav, n_fft, hop_size, win_size, compress_factor) torch.cuda.synchronize(device) logger.info('2 - %f in seconds.', time.perf_counter() - before_time) before_time = time.perf_counter() amp_g, pha_g, com_g = model(noisy_amp, noisy_pha) torch.cuda.synchronize(device) logger.info('3 - %f in seconds.', time.perf_counter() - before_time) before_time = time.perf_counter() audio_g = mag_phase_istft(amp_g, pha_g, n_fft, hop_size, win_size, compress_factor) audio_g = audio_g / norm_factor torch.cuda.synchronize(device) logger.info('4 - %f in seconds.', time.perf_counter() - before_time)

RoyChao19477 / SEMamba

Inference speed on a small inputs #4