Open recursionbane opened 1 year ago
What would it take to run this on an Apple M1 or M2 chip with 16+GB of unified CPU/GPU memory?
Most of these AI applications require raw GPU power, which Apple Silicon simply does not provide. The applications that do run on Apple Silicon are EXTREMELY slow compared to running with proper GPU power.
I have an Apple Silicon machine myself, but don't bother trying to run any AI based applications on it, I use RunPod for that.
You may want to check out this RunPod template I made, which is a Docker image to run Audiocraft on a RunPod GPU:
Just figured I'd post this here for people with silicon ... https://developer.apple.com/metal/pytorch/ ... not that it fully solves this issue.
You can already use it on Apple silicon, you just have to use model = MusicGen.get_pretrained('size(replace it)', device="cpu"). Performance are meh but I'm working on using the metal backend (mps) wich should improve it a lot.
MPS utilization is possible upon the release of this PR for pytorch, you could try with a custom build with that branch: https://github.com/pytorch/pytorch/pull/99272
at the moment i think autocast is not really working with Silicon, which would completely mess up the memory usage and speed.
You're right, and it is in fact refusing to work, but the branch pytorch/pytorch#99272 is trying to solve this issue.
I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower.
(EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft
MPS utilization is possible upon the release of this PR for pytorch, you could try with a custom build with that branch: pytorch/pytorch#99272
I tried using this and it's pretty broken it seems.
I also added configs for mps (code) and generated outputs without errors, but the generated sound using mps was bad while using cpu worked, as expected.
Then I inspected the gen_tokens and model parameters. The code and outputs can be seen here.
Using mps (and setting use_sampling to be False) resulted in exactly the same gen_tokens values as using cpu, but decoded outputs was different. Then I decoded mps_gen_tokens by cpu_model, and the output sound quality was as good as cpu. So there seems to be something wrong with mps_model.compression_model.decoder. I also inspected mps_decoder parameters but the value was exactly the same as cpu_decoder, so why the mps_decoder fails to decode properly is not revealed yet.
In summary, generating tokens by mps and decoding them by cpu does work. Using both mps and cpu in generating outputs may not be the best solution, but using cpu only when decoding does not cause big performance problems.
@EbaraKoji your solution seems as close as I can find anywhere, but I'm having some trouble building the gpu model on the first step
mps_model = AudioGen.get_pretrained('facebook/audiogen-medium', device='mps')
with the error
RuntimeError: User specified an unsupported autocast device_type 'mps'
any chance I'm missing something obvious here? I pulled down the dev audiogen... ty
I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower.
(EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft
@trizko did you get your PR working, tried but no dice, still garbled.
I was able to get the model to run on MPS. Unfortunately, it just spits out garbled sounds for some reason. Using CPU with the same settings works fine but is 3x slower. (EDIT) If anyone has any ideas on how I could get this working, here is the fork with the progress I've made so far: https://github.com/trizko/audiocraft
@trizko did you get your PR working, tried but no dice, still garbled.
Unfortunately, I did not get it working in my fork of the code. @EbaraKoji 's solution above^ seems to be the best right now.
@diffractometer I made the necessary changes not on the main branch but on "develoopment/mps" branch.
Please confirm torch.has_mps
returns True on your local machine and the forked audiocraft branch is checked out to "develoopment/mps".
I had no errors without disabling autocast in mps on my machine, but if you still have any problems with autocast, please pull the latest commit(3f6ef) on my fork.
As I mentioned above, the core problem is the mps-decoder. So I guess this tweak will also work on @trizko 's fork.
# audiocraft/models/encodec.py
class EncodecModel(CompressionModel):
def decode(self, codes: torch.Tensor, scale: tp.Optional[torch.Tensor] = None):
...
emb = self.decode_latent(codes)
- out = self.decoder(emb)
+ if emb.device.type == 'mps':
+ # XXX: Since mps-decoder does not work, cpu-decoder is used instead
+ out = self.decoder.to('cpu')(emb.to('cpu')).to('mps')
+ else:
+ out = self.decoder(emb)
@EbaraKoji worked for me on facebook/musicgen-small
. Thank you!
development/mps
branch in conditioners.py
:
self.autocast = TorchAutocast(enabled=False, device_type=self.device, dtype=dtype)
no idea why ¯_(ツ)_/¯. Any insights on the continuation issue? Should your branch work for both generate
and generate_continuation
?
@diffractometer Thank you for your reproduction and reports!
I also tried the generate_continuation
and produced garbled sound.
I found that compression_model.encoder
had the similar problem as the decoder(debug outputs). So the following fix worked.
# audiocraft/models/encodec.py
class EncodecModel(CompressionModel):
def encode(self, x: torch.Tensor) -> tp.Tuple[torch.Tensor, tp.Optional[torch.Tensor]]:
...
x, scale = self.preprocess(x)
- emb = self.encoder(x)
+ if x.device.type == 'mps':
+ # XXX: Since mps-encoder does not work, cpu-encoder is used instead
+ emb = self.encoder.to('cpu')(x.to('cpu')).to('mps')
+ else:
+ emb = self.encoder(x)
The difference from generate
is that the prompt is set in _prepare_tokens_and_attributes
and then compression_model.encode(prompt)
is executed. That's why generate_continuation
did not work but generate
did even before the encoder was fixed.
I have checked only a few set of generating situations, so there may be other issues with generations using mps. However, as long as the problems don't lie in heavy tensor calculations, the workaround of partially using cpu can be considered as valid solutions in many situations, though not the best.
@EbaraKoji Well it's been quite some time! I've been quite busy lately, but I have had a little time to tinker around with audiocraft over the holidays, and I'm happy to report that picking up where I left off self.encoder.to('cpu')(x.to('cpu')).to('mps')
has allowed me to encode continuations with MPS and do non tensor heavy calculations with CPU, so, awesome! Thanks again. I'll try running many different generating situations.
What would it take to run this on an Apple M1 or M2 chip with 16+GB of unified CPU/GPU memory?