Open aidv opened 4 years ago
I don't think so. Tensorflows documentation states that it does not place operations into multiple GPUs automatically. Tensorflow does not easily share graphs or sessions among multiple processess. There are some blogs on this discussion on the towardsdatascience.com site
I assume you have been able to get the training working. What is your set-up?
@stickyninja3 There's something called Distributed training that implies that it is possible.
tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes.
Also looking at the Spleeter source code, it implies that multiple machines can be used to train a model.
What I wonder now is why multiple machines before taking full advantage of multiple GPU's, unless multiple GPU's in multiple machines are present.
Either way I'd love if the Spleeter devs would address this as it would greatly benefit the community.
So what would be nice to address is:
Hi @aidv
We have no plans to work on this feature for the moment. We don't have much experience with the Distributed training strategies and as @stickyninja3 said, it would probably require quite a lot of tuning to make it efficient.
If you feel that it can be achieved with minor changes, feel free to send us a gist of code and we'll look into it.
@mmoussallam thank you for addressing that. I have looked into it a little bit, I don't have much knowledge in anything tensorflow related but I'm learning little by little.
So what about the multiple machines?
In Spleeter file train.py
at line 95 I can see tf.estimator.train_and_evaluate(...
and when tracing the function train_and_evaluate
it takes me to the file training.py
which is located in the folder C:\Users\username\Anaconda3\Lib\site-packages\tensorflow_estimator\python\estimator
.
Reading some of the comments I can see a whole bunch of info regarding distributed training. It seems to be very doable.
Hi all,
I think distributed training is easier with version 2 of tensorflow. The blogs i read all stated that tensorflow 1.14 / 1.13 don't share models across GPUs. It would be interesting to see what improvements could be made, but I can't even get training working on a single GPU. Nothing i have tried seems to work. It would be interesting to know the exact environments you use. I have been given my Dads old work laptop, which has a GTX1660. Going to reformat and try Ubuntu 18.04 now
@stickyninja3 I wonder how hard it would be to convert Spleeter code to use v2 of Tensorflow 🤔
Are you on Windows or MacOS? I'm on Windows and it's actually pretty easy to get it up an running.
Give me your email and I'll send you a message.
Hi aidv,
alecjclarke@live.co.uk is my email. I have tried using Windows but couldn't get training working. I have a laptop to use. It has 64Gb memory. Core i7 gen9. GeForce GTX 1660ti.
It would be great to get this working.
Thanks, Alec
Is this possible?
Can I train using multiple GPUs?