facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.17k stars 1.03k forks source link

iOS viability #396

Open wajsic opened 1 year ago

wajsic commented 1 year ago

❓ Questions

Has anyone had any success converting the model to be mobile-optimized? Has anyone had any luck running it on iOS with pyTorch if it's even possible? I don't have any minimum requirements, just checking if it is viable or not.

Some stuff needs to be implemented for sure, and I've tried to JIT trace the model, but some constant comparisons for JIT are throwing errors, so I will have to go over all branches in the model and try to trace it, then optimize it for mobile.

Mainly this question is if anyone sees any issues fundamentally in the approach I took.

CarlGao4 commented 1 year ago

First of all, you need to install a Python on iOS. Then you need to build pyTorch and other dependencies for it. In fact, the separation can be really slow, so I haven't tried it, though it seems possible to achieve.

KaiVinter commented 1 year ago

First of all, you need to install a Python on iOS. Then you need to build pyTorch and other dependencies for it. In fact, the separation can be really slow, so I haven't tried it, though it seems possible to achieve.

What about in Android? I've tried instaling it via Termux, but no success. Is there a way, either with pyTorch or something similar?

acosmicflamingo commented 1 year ago

I imagine that because the pre-trained models are licensed to only be used for scientific/educational purposes, that even if it was technically possible for Apple's coremltools model generator to generate a .coreml variant of Demucs, that the license makes it illegal for any iOS app in the App Store to use the Demucs music separation technology. Is that correct?

wajsic commented 1 year ago

I imagine that because the pre-trained models are licensed to only be used for scientific/educational purposes, that even if it was technically possible for Apple's coremltools model generator to generate a .coreml variant of Demucs, that the license makes it illegal for any iOS app in the App Store to use the Demucs music separation technology. Is that correct?

Could you point me where it says that pretrained are licensed? Even if so, there is a training doc, to train it on your own.

CoreML is not possible (for now), since it doesnt support complex numbers which are needed for STFT.

Demucs itself is licenced under MIT, so not sure why would it be illegal to use it in appstore.

wajsic commented 1 year ago

First of all, you need to install a Python on iOS. Then you need to build pyTorch and other dependencies for it. In fact, the separation can be really slow, so I haven't tried it, though it seems possible to achieve.

Python on iOS? I am using LibTorch (c++ version of pyTorch). I am now rewriting all of the layers in C++ (HDemucs, Demucs,Encoders/Decoder), so will see how far I get. Why not use JITed model? I want to debug it step by step on par with python code, to see if ther are any discrepancies. Was mainly checking if someone else did it and can see issues with my approach.

wajsic commented 1 year ago

This question was raised, to hopefully be seen by someone that knows a lot about mobile ML to say to me that my approach is fundamentally flawed and why.

cewatkins commented 1 year ago

Uh, I would ask if you're re-write is available, but Torch is supposed to do torch on Android, but as you must compile from source, I can tell you that the demucs -d options (torch) aren't there & source compiling torch is something to take a vacation for, cause it takes longer than to write the code. But that being said, if you can get -d vulkan to work, I'd call that mobile enhancement, as -d cpu basically takes as long as it does to listen to song. The code & backend is all there, it's the pretty front end pictures as track database to be had, but vendor's have already stuck it in daws and it wasn't so pretty.

AntonZN commented 1 year ago

@wajsic Did you manage to do it on ios?

wajsic commented 1 year ago

Yes and no. It is (was?) not that easy to convert demucs to coreml because at the time it didnt have complex support and fft ops. So I rewrote the whole model to MPS graph. Since SFFT isn't MPS op as well, it was a hybrid between gpu and cpu. I stopped at the last stage of decoders since I couldn't get transpose Conv2D to behave as Conv1D because of a bug in MPS params (or documentation). I think that is fixed so I might revisit this in the near future but around 80-90% of the v5 model (one without transformers) is working really fast on iOS (faster than realtime if you set the chunking right)

alexvoina commented 1 year ago

interested

acosmicflamingo commented 1 year ago

Even if you create your own weights, you're training them from a database that comprises music that has licenses that prevent you from using it commercially:

"The data from musdb18 is composed of several different sources:

100 tracks are taken from the DSD100 dataset, which is itself derived from The 'Mixing Secrets' Free Multitrack Download Library (opens new window). Please refer to this original resource for any question regarding your rights on your use of the DSD100 data. 46 tracks are taken from the MedleyDB (opens new window) licensed under Creative Commons (BY-NC-SA 4.0). 2 tracks were kindly provided by Native Instruments originally part of their stems pack (opens new window). 2 tracks are from the Canadian rock band The Easton Ellises as part of the heise stems remix competition (opens new window), licensed under Creative Commons (BY-NC-SA 3.0)."

I imagine the only way Demucs technology could ever be used in a commercial iOS app is if you personally paid a bunch of musicians to create songs for the sole purpose of stem/extraction, and then used that data. Something tell me that that's what Moises did since they have released their own database that they used to train data.

acosmicflamingo commented 1 year ago

@wajsic NICE! Can you share the CoreML file with us? 🤣 just kidding; very happy for you (and your studio)!

I have actually been playing devil's advocate because I wanted to be proven wrong, since I would love to use this in my own music app. Unfortunately for me, it looks like the only way I will ever (as an indie developer) be able to utilize the technology is if a company licenses their CoreML models to me for some cut of the app's profits. Oh well 😄

alexvoina commented 1 year ago

@acosmicflamingo I'm still waiting for you to be proven wrong by someone from the facebook research team. It seems somewhat ridiculous that the only reason this amazing piece of technology cannot be used by someone in a commercial application is because 10 songs used to train the algorithm are provided under a restrictive license :)) Also, I would've expected a disclaimer somewhere in this repo about this legal liability. When you see the MIT license you can be led astray. Imagine investing the time to build a product on top of this just to find out you can no longer sell it, until you retrain the model.

acosmicflamingo commented 1 year ago

@alexvoina Yeah, it's certainly hard to see what we are allowed and not allowed to do. That's why I was doing some due diligence to make sure I don't spend a ton of time developing an app based on a technology that's to be used for research purposes. It almost seemed too good to be true that I could develop an app with Demucs and pocket 100% of the huge revenue stream, while the Facebook research scientists and engineers that developed the technology and the musicians whose work was used to generate the model would never receive a single penny from it. If anyone is entitled to profit, it would be them, and not the person that just ran "coremltools demucs.py" and integrated the .coreml output file to an app.

@wajsic oh wow that was very generous of you to share (even if it's 5-10%) 😄 a few years ago I was converting some Demucs Python code line-by-line so it'd be something that coremltools could handle. That was until I hit the (imaginary numbers) wall LOL

alexvoina commented 1 year ago

@acosmicflamingo I encourage you to go ahead create that app and start making money. You find out that it's not that easy. The facebook engineers got paid really well i'm sure. The musicians made some money out of donations or these projects were funded by governments. In the end it is Facebook's decision whether or not they want to capitalise their research directly or not. And I'm pretty sure they have a long term revenue strategy, that's why they MIT their work. And remember, they can at any time change that MIT into GPL3 :D

acosmicflamingo commented 1 year ago

@alexvoina I don't think that explanation will sway the jury 🤣

alexvoina commented 1 year ago

@acosmicflamingo :))) probably not

naticio commented 11 months ago

can demucs run in an phone even if it is converted to coreml? I mean it uses 8gb of ram and some old iphones have 1gb of ram!

CarlGao4 commented 11 months ago

If we limit the system to iOS 15 or newer, then there will be at least 3 GiB RAM. And even limit it to iPad (Pro) only, then there will be at least 4 GiB (8GiB) RAM