facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
21.06k stars 2.17k forks source link

Feature Request: Support Apple Silicon #31

Open recursionbane opened 1 year ago

recursionbane commented 1 year ago

What would it take to run this on an Apple M1 or M2 chip with 16+GB of unified CPU/GPU memory?

ashleykleynhans commented 1 year ago

What would it take to run this on an Apple M1 or M2 chip with 16+GB of unified CPU/GPU memory?

Most of these AI applications require raw GPU power, which Apple Silicon simply does not provide. The applications that do run on Apple Silicon are EXTREMELY slow compared to running with proper GPU power.

I have an Apple Silicon machine myself, but don't bother trying to run any AI based applications on it, I use RunPod for that.

You may want to check out this RunPod template I made, which is a Docker image to run Audiocraft on a RunPod GPU:

https://runpod.io/gsc?template=ks0mgazj0m&ref=w18gds2n

nutheory commented 1 year ago

Just figured I'd post this here for people with silicon ... https://developer.apple.com/metal/pytorch/ ... not that it fully solves this issue.

cmauget commented 1 year ago

You can already use it on Apple silicon, you just have to use model = MusicGen.get_pretrained('size(replace it)', device="cpu"). Performance are meh but I'm working on using the metal backend (mps) wich should improve it a lot.

faraday commented 1 year ago

MPS utilization is possible upon the release of this PR for pytorch, you could try with a custom build with that branch: https://github.com/pytorch/pytorch/pull/99272

adefossez commented 1 year ago

at the moment i think autocast is not really working with Silicon, which would completely mess up the memory usage and speed.

cmauget commented 1 year ago

You're right, and it is in fact refusing to work, but the branch pytorch/pytorch#99272 is trying to solve this issue.

trizko commented 1 year ago

I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower.

(EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft

0xdevalias commented 1 year ago

Duplicate https://github.com/facebookresearch/audiocraft/issues/43

johnrichardrinehart commented 1 year ago

MPS utilization is possible upon the release of this PR for pytorch, you could try with a custom build with that branch: pytorch/pytorch#99272

I tried using this and it's pretty broken it seems.

EbaraKoji commented 1 year ago

I also added configs for mps (code) and generated outputs without errors, but the generated sound using mps was bad while using cpu worked, as expected.

Then I inspected the gen_tokens and model parameters. The code and outputs can be seen here.

Using mps (and setting use_sampling to be False) resulted in exactly the same gen_tokens values as using cpu, but decoded outputs was different. Then I decoded mps_gen_tokens by cpu_model, and the output sound quality was as good as cpu. So there seems to be something wrong with mps_model.compression_model.decoder. I also inspected mps_decoder parameters but the value was exactly the same as cpu_decoder, so why the mps_decoder fails to decode properly is not revealed yet.

In summary, generating tokens by mps and decoding them by cpu does work. Using both mps and cpu in generating outputs may not be the best solution, but using cpu only when decoding does not cause big performance problems.

diffractometer commented 1 year ago

@EbaraKoji your solution seems as close as I can find anywhere, but I'm having some trouble building the gpu model on the first step

mps_model = AudioGen.get_pretrained('facebook/audiogen-medium', device='mps')

with the error

RuntimeError: User specified an unsupported autocast device_type 'mps'

any chance I'm missing something obvious here? I pulled down the dev audiogen... ty

diffractometer commented 1 year ago

I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower.

(EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft

@trizko did you get your PR working, tried but no dice, still garbled.

trizko commented 1 year ago

I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower. (EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft

@trizko did you get your PR working, tried but no dice, still garbled.

Unfortunately, I did not get it working in my fork of the code. @EbaraKoji 's solution above^ seems to be the best right now.

EbaraKoji commented 1 year ago

@diffractometer I made the necessary changes not on the main branch but on "develoopment/mps" branch.

Please confirm torch.has_mps returns True on your local machine and the forked audiocraft branch is checked out to "develoopment/mps". I had no errors without disabling autocast in mps on my machine, but if you still have any problems with autocast, please pull the latest commit(3f6ef) on my fork.

As I mentioned above, the core problem is the mps-decoder. So I guess this tweak will also work on @trizko 's fork.

# audiocraft/models/encodec.py

class EncodecModel(CompressionModel):
  def decode(self, codes: torch.Tensor, scale: tp.Optional[torch.Tensor] = None):
...
      emb = self.decode_latent(codes)
-     out = self.decoder(emb)
+     if emb.device.type == 'mps':
+         # XXX: Since mps-decoder does not work, cpu-decoder is used instead
+         out = self.decoder.to('cpu')(emb.to('cpu')).to('mps')
+     else:
+         out = self.decoder(emb)
diffractometer commented 1 year ago

@EbaraKoji worked for me on facebook/musicgen-small. Thank you!

Any insights on the continuation issue? Should your branch work for both generate and generate_continuation?

EbaraKoji commented 1 year ago

@diffractometer Thank you for your reproduction and reports! I also tried the generate_continuation and produced garbled sound.

I found that compression_model.encoder had the similar problem as the decoder(debug outputs). So the following fix worked.

# audiocraft/models/encodec.py

class EncodecModel(CompressionModel):
  def encode(self, x: torch.Tensor) -> tp.Tuple[torch.Tensor, tp.Optional[torch.Tensor]]:
...
      x, scale = self.preprocess(x)
-     emb = self.encoder(x)
+     if x.device.type == 'mps':
+         # XXX: Since mps-encoder does not work, cpu-encoder is used instead
+         emb = self.encoder.to('cpu')(x.to('cpu')).to('mps')
+     else:
+         emb = self.encoder(x)

The difference from generate is that the prompt is set in _prepare_tokens_and_attributes and then compression_model.encode(prompt) is executed. That's why generate_continuation did not work but generate did even before the encoder was fixed.

I have checked only a few set of generating situations, so there may be other issues with generations using mps. However, as long as the problems don't lie in heavy tensor calculations, the workaround of partially using cpu can be considered as valid solutions in many situations, though not the best.

diffractometer commented 11 months ago

@EbaraKoji Well it's been quite some time! I've been quite busy lately, but I have had a little time to tinker around with audiocraft over the holidays, and I'm happy to report that picking up where I left off self.encoder.to('cpu')(x.to('cpu')).to('mps') has allowed me to encode continuations with MPS and do non tensor heavy calculations with CPU, so, awesome! Thanks again. I'll try running many different generating situations.