Speech Handling? - Githubissues

aeromamba-super-resolution / aeromamba

Official implementation of "AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models", presented in LAMIR 2024 Workshop

Creative Commons Zero v1.0 Universal

14 stars 1 forks source link

Hi, thanks for trying to extend it!

I've never tried it on speech data during the course of its development, but I always thought that it would be able to handle that nature of problem, since AERO does it, and there are also some Mamba applications for speech enhancement.

I ran a mini experiment with 2 speakers (1 train and 1 test) and it seems that the model is able to extend ( although badly, obviously :) ).

spec

I see two possible reasons for that bug:

1) Your dowsampling procedure is not being recognized by the model, therefore I suggest that you use a sox command such as sox "$input_file" -r freq_samp -c 1 "$output_file". 2) The prediction code must be changed is some kind of way to handle this sampling frequency, since all of my setting is for 11.025 -> 44.1 kHz.

If the results in the output folder are being extended (in my case they are), then it is definitely one of those cases.

aeromamba-super-resolution / aeromamba

Speech Handling? #3