facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.17k stars 2.01k forks source link

Official MPS support for MAGNeT? #396

Open cocktailpeanut opened 5 months ago

cocktailpeanut commented 5 months ago

Hi thank you for this amazing work. I had to update some code on my end to get it to work on Mac machines (without using xformers). See the update here: https://github.com/peanutcocktail/audiocraft/commit/370e15c0ef3e0c1e786dd02b3c25957714e95a25

I'm not sending a PR because my "fix" is super ad-hoc (basically using environment variables to avoid the xformers related code, was just focused on getting it to work.

Anyway, I was OK with AudioCraft MusicGen/AudioGen being slow, but this time I think MAGNeT is totally missing out by NOT supporting MPS directly. Looks like it's using the CPU mode currently.

Because the main value of MAGNeT is the speed, I would even go as far to say, this could be the StableDiffusion moment of audio, if we get this to work on ALL platforms (starting with MPS). I wrote about this here: https://x.com/cocktailpeanut/status/1747290801527226394?s=20 It would be a game changer if this works on M1/M2/M3 properly with MPS support.

Any plans to add MPS specific logic to the MAGNeT code?

jet3004 commented 5 months ago

Yes! Please consider adding MPS – a lot of people at Meta with Macs would benefit from this.

robertcedwards commented 5 months ago

Thumbs up to this one, please do!

01dx commented 5 months ago

MPS please.

MrBerkley commented 5 months ago

PLEASE

AaronWard commented 5 months ago

Commenting to keep updated on this, i want to run this on my M1 without waiting until next week

flenser commented 5 months ago

Another vote for this, I want to have faster generation on Mac.

theadamsabra commented 5 months ago

@cocktailpeanut - I don't mind working on this, however I'm not aware of the limitations/nuances of MPS backend.

From your changes, it seems like just an "override" of xformers to enable torch MPS by ignoring IGNORE_MEMORY_EFFICIENT. Am I correct in this?

Can you elaborate on what this env variable does?

damian0815 commented 2 months ago

@theadamsabra I've taken a deeper dive into MPS compatibility with MAGNeT, based off of @cocktailpeanut's changes. A lot of what I've done is just extended if self.device.type == 'cpu' to if self.device.type == 'cpu' or self.device.type == 'mps', but there is also a workaround required to get audio decode to work.

Basically, in https://github.com/facebookresearch/audiocraft/compare/main...damian0815:audiocraft:mps_fixes, the following problems have been addressed to make MAGNeT work with MPS:

  1. xformers doesn't work with MPS, the IGNORE_MEMORY_EFFICIENT env var is @cocktailpeanut 's fix for that.
  2. Autocast isn't supported on MPS, which the existing code checks by if self.device.type == 'cpu' but this needs to be extended to MPS as well.
  3. pytorch bug https://github.com/pytorch/pytorch/issues/124834 means that ELU is broken on MPS which means the tokens to audio decode produces garbled output. this can be overridden by forcing the ELU operation to run on the CPU.

I think the correct way to deal with these is to

  1. abstract out the memory efficient parts of the logic to not be dependant upon xformers (which I believe been largely superseded by torch's internal memory attention logic anyway), and
  2. either invert the logic for disabling autocast on CPU so that it only enables autocast on CUDA, or add some global autocast disabling flag so that the logic can be if not self.disable_autocast: instead of if self.device.type == 'cpu':