f90 / Wave-U-Net

Implementation of the Wave-U-Net for audio source separation
MIT License
845 stars 177 forks source link

Not sure if this is even possible #5

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello again. I noticed you added the multi-instrument dataset and I'm very interested in this. However: I cannot get this running on Windows, which is my primary OS... I think it might be impossible since I have every dependency installed on my Python 3.5 installation aside from scikits.audiolab. As far as I can tell, that one isn't available for Python 3.x... Am I hopeless? I don't have an Ubuntu computer handy (I can't dual boot since Windows overwrites the GRUB bootloader every time after you boot back into Windows...

f90 commented 6 years ago

Mh so I can't provide a full-flegded solution to this unfortunately, only the following suggestions:

  1. Are you not able to install Python 2 on your Windows system? That would mean you can fulfill the requirements for my code (since it is written in Python 2) plus you could maybe install scikits.audiolab. If you stay with Python 3, that may work, but you might have to convert some parts of the code to Python 3 beforehand. I don't expect there to be many changes necessary and there is also conversion tools out there (2to3.py if I remember). But still, potentially errors are introduced in this process.

  2. If you cannot fulfill scikits.audiolab requirement, and you don't need to read in any aiff or TIMIT data for training the model (so if you just want to use pre-trained models you are definitely fine, or if you want to use only MUSDB and CCMixter datasets), you can remove the scikits.audiolab import statement which only occurs here:

https://github.com/f90/Wave-U-Net/blob/master/Metadata.py#L4

And then in the get_audio_metadata function, you would simply remove the if clause related to aiff or sphereType files. So it would probably end up being:

def get_audio_metadata(audioPath, sphereType=False):
    '''
    Returns sampling rate, number of channels and duration of an audio file
    :param audioPath: 
    :param sphereType: 
    :return: 
    '''
    ext = os.path.splitext(audioPath)[1][1:].lower()
    if ext=="mp3": # Use ffmpeg/ffprobe
        sr, channels, duration = get_mp3_metadata(audioPath)
    else:
        snd_file = SoundFile(audioPath, mode='r')
        inf = snd_file._info
        sr = inf.samplerate
        channels = inf.channels
        duration = float(inf.frames) / float(inf.samplerate)
    return int(sr), int(channels), float(duration)

You can try that and then see if it works!

ghost commented 6 years ago

Well as I said I can't use Python 2.7 on Windows since Tensorflow-gpu isn't supported at all on that version and is only supported on 3.x

f90 commented 6 years ago

In that case I would try to follow the second part of my previous comment to get rid of the scikits.audiolab dependency. Then you need to convert the code to Python 3. There are existing converters for this. Then try running it and see if you run into any issues.

ghost commented 6 years ago

Hmm, nope. It complains about OutputLayer. I think I'll just try and get Ubuntu on a spare hard drive which I have...

f90 commented 6 years ago

OK, unfortunately it seems a bit more complicated to make this run in this setup to the point where it's apparently less work to just setup the required environment in the first place. Therefore I will close the issue here.

If anyone out there wants to port this to Python 3 via a pull request though, be my guest, we could have a separate Python3 branch as well.