fakufaku / torchiva

Blind source separation with independent vector analysis family of algorithm in torch
https://torchiva.readthedocs.io/en/latest/
MIT License
88 stars 5 forks source link

torchiva.separation should be torchiva.separate #4

Open StuartIanNaylor opened 1 year ago

StuartIanNaylor commented 1 year ago

python -m torchiva.separation INPUT OUTPUT

python -m torchiva.separate ./examples/samples/mix_reverb/103-1240-0003_1235-135887-0017.wav output.wav

Just wondering if you guys might be planning or would for us not so talented export a quantised model to say tflite or onnx for realtime processing with say a ladspa plugin.

I was just testing with noise than double talk and it works great and as is approx x2.3 realtime on a RK3588 but wondering how light the load could be quantised / c++ ladspa? There is an absense of opensource BSS libs in Linux, so fingers crossed but likely beyond what I am capable so thought I would ask.

fakufaku commented 1 year ago

@StuartIanNaylor Thanks for reporting the typo in the README! Silly of me... 😆

I did not have plans for quantized model, but that is actually a great idea! I will look into it, but I can't promise any timeline... I'll add it to the repo if I manage to get it to work. I am interested to try out onnx for example, but do not have any experience. If you have some and would like to help out, that would also be greatly appreciated.

I am also quite interested to hear about experience with data in the wild! I would be super curious to see your results if you would like to share something 🤗

StuartIanNaylor commented 1 year ago

Yeah prob can by giving you a look at https://github.com/usefulsensors/openai-whisper as nyadla-sys has created various examples for exports to onnx and then tflite, so plenty of templates for the conversion but prob nyadla-sys would be a great source of info.

As for noise I tested it with https://drive.google.com/file/d/1N90DtDjcm-ejbUpqMHsJAkKYIxoy5iMG/view?usp=share_link which did a great job and produced https://drive.google.com/file/d/1V8Frsa3H7eCx3YuNRQnm09WlC5dMKZwG/view?usp=share_link

Ignore the glitches due to clipping as that is the test sample that should of had AGC.

I have been trying to find a good low load 2 channel BSS / Deverb for 2 years now and ultimately it should be targetted speaker BSS as guess with doubletalk the target channel will be random, but tackle that one later.

I will be watching and testing, so just ask, but nyadla-sys is the man and examples as said in above repo.

StuartIanNaylor commented 1 year ago

@fakufaku https://github.com/wenet-e2e/wesignal as might be of interest to what is submitted