f90 / Wave-U-Net

Implementation of the Wave-U-Net for audio source separation
MIT License
824 stars 177 forks source link

Test it on random song #2

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello !

I would like to ask how to test it on random song (test.wav)

Thank you in advance

f90 commented 6 years ago

Hello,

I added an easy-to-use Predict.py script that you can run (see README for how exactly) that loads a pretrained model from a given Tensorflow model checkpoint file, and produces output on any given input file. If I have time I might add a pretrained model that you can use by default, until then, you need to train it yourself first.

Please tell me if that helps, then I can close the issue

ghost commented 6 years ago

Hi there. I'm interested in trying this out as well, but I have no clue on what to do. I requested access to the 4.7 GB dataset (as instructed) and I have that, but what now? hmm...

f90 commented 6 years ago

Did you try following the Readme? Where exactly are you stuck in the instructions within the README file? In there I explain that you have to set up the Python environment by installing the packages, set up the filepaths to your database, and then you can start training by running a Python script.

f90 commented 6 years ago

I added a pretrained stereo vocal separation model to the repository that can be checked out if Git-LFS is installed before cloning, which can be easily used by executing Predict.py (see README) to apply it to your own songs. Please have a go and report back whether everything works on your end. Thanks!

ghost commented 6 years ago

Thank you! I shall try this and will get back to you / if everything works as planned. I was most interested in multi-instrument separation, but the vocal one will keep me occupied for now!

ghost commented 6 years ago

Works as it should, though the WAV files are abnormally large for some reason. meaning, they'll be a bit over 100 MB each. I can fix this by exporting it again through Audacity. this fixes the size issue. Works reasonably well even on very complex tracks. I'd love to try the multi-instrument one, too :+1:

f90 commented 6 years ago

Glad it works!

Good point on the output file size. Would writing ogg by default be a good solution? Or would it be important to make it customizable in terms of output format? Writing MP3 is a bit problematic due to licensing issues, an external encoder would need to be installed on the system, a dependency I would rather like to avoid.

Lets see what I can do with regards to the multi-track system. The separation is a bit worse there than for vocals only, maybe due to a smaller training dataset plus restricted network capacity. But I could train a bigger version and see if I can even surpass the paper results.

ghost commented 6 years ago

If possible, a FLAC writer would be nice since the results would remain lossless (and a bit smaller than a standard WAV).

Well, the results I hear on the site with the examples are very good. I'd be happy with those results (I like to evaluate the various things which come out and do this type of stuff, and I also use spectral editors to refine and clean up any separations which may need it).

f90 commented 6 years ago

I tried to support other file types using the soundfile library, but at least on my system it gives me an error when I try to export into OGG, or FLAC. It might work on Python3 but I don't want to port the whole repository to that and risk introducing more problems. I guess this has to be enough for now, sorry! Should be fairly easy to append your own conversion program to mine so that it automatically is converted.

The results have a low-pass filter sound because the pretrained model works on 22KHz audio, meaning high frequencies are ignored. But you can easily change this and train 44KHz CD-quality separators by changing the expected_sr setting in Config.py before starting to train :)

Good that you're happy with the results. Going to close this issue as custom-song prediction seems to work now.