collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
4k stars 218 forks source link

CPU + MPS Support #56

Closed fakerybakery closed 9 months ago

fakerybakery commented 10 months ago

Hi Do you know if CPU and MPS support is on the roadmap? Thanks!

jpc commented 10 months ago

CPU could be supported through whisper.cpp/llama.cpp but we are not working on that right now. MPS should work with minimal tweaks (there may be some hardcoded “cuda” settings).

fakerybakery commented 10 months ago

Nice, thanks. Do you know how much work it would take to get WhisperSpeech working with whisper.cpp?

DePasqualeOrg commented 10 months ago

Adding my vote for MPS support: I'd love to use this on Macs and iOS devices.

fakerybakery commented 10 months ago

Not sure if you can run Python on iOS w/o iSH

jpc commented 10 months ago

@fakerybakery You can try and report back how difficult it is :)

I don't have this on my roadmap right now (I am mostly focused on improving quality and language coverage right now) but, if someone needs this, a consulting contract is a very effective way to make sure it happens.

Grzegosz commented 10 months ago

Would be great if someone add MPS support. Can't run this on Mac, and mac's are quite often used with LLMs now

BBC-Esq commented 10 months ago

CPU could be supported through whisper.cpp/llama.cpp but we are not working on that right now. MPS should work with minimal tweaks (there may be some hardcoded “cuda” settings).

I might take this one on...but first please see my recent issue about pull requests and whether you're open to source code modifications without me using a Jupyter Notebook...unless someone wants to show me how.

Basically, I'd be considering tackling:

1) ensuring AMD GPU-acceleration on Linux via rocM (unfortunately, pytorch doesn't support AMD GPUs on Windows) ---This should involve minimal changes since it uses the "cuda" device within the pytorch framework, so it'd just be a matter of doublechecking the code actually for minor changes.

2) ensuring MPS support, which, again, involves minor changes (adding "mps" as a vible device within pytorch).

3) likely adding source code-wide changes to use "cuda," "mps" or "cpu" as the default compute device depending on a user's system.

zoq commented 10 months ago

Just left a response on https://github.com/collabora/WhisperSpeech/issues/73 would be great to have MPS support.

jpc commented 10 months ago

@BBC-Esq we are using nbdev. it allows you to edit either the notebooks or the .py files and later synchronize the changes.

I am on holiday next week but afterwards I am happy to either help you setup nbdev or if you make a PR I can merge your changes back into the notebooks.

akorzh commented 9 months ago

modifying Whisperspeech to run on torch MPS backend was not so hard - just replaced .cuda() with .to("mps"), added map_location='mps' to couple torch.loads and removed 'with sdp_kernel' lines. But i hit some problem with vocoder - MPS doesnt have real x complex GEMMs(some assert) and no complex.out is implemented for MPS so need a little bit of help here

BBC-Esq commented 9 months ago

Here's the pull request I did as well. Want to work together on this? https://github.com/collabora/WhisperSpeech/pull/77 I'm not that familiar with github, but I think there's a way to work together on a pull request?

akorzh commented 9 months ago

did you get it working - i did more changes and still wasnt able to run inference exampl notebook BTW all those .py files are generated from notebooks, so need to modify those as well

BBC-Esq commented 9 months ago

No, the pull request was simply to show an example of choosing between "cuda," "mps" or "cpu" based on the get_compute_device function within utils.py. I was hoping to get feedback as far as that approach in general (a function that dynamically determines the compute device) before modifying the other scripts. Multiple other scripts will need to be modified to set the appropriate compute device dynamically if the developer approves this approach, basically.

Also, now we're aware of the issue that you raised regarding vocoder above. Was hoping to get the "go ahead" beforehand, basically. If you want to work on this together, I'm assuming we'd work on the branch I created (from which the pull request came from)? Kind of new to github...

BBC-Esq commented 9 months ago

@jpc What did you think of the draft pull request. Am I on the right track and do you want me to work on modifying the other scripts as well?

jpc commented 9 months ago

Regarding Vocos and MPS maybe it would be worth raising an issue on their GitHub and see what the author says? I was using this model as-is so I am unfortunately not familiar with its internals.

If this does not help I can try looking into this next week.

jpc commented 9 months ago

The sdp_kernel is kind of important for performance on CUDA so we’d have to figure out how to make them transparent for MPS. Maybe make a new context manager that wraps the one from PyTorch?

BBC-Esq commented 9 months ago

I'll do what I can on the draft pull request, but others will likely have to help since I don't have a MacOS to test on...I can at least get the overall framework there in terms of dynamically choosing the compute deivce across all scripts...

akorzh commented 9 months ago

ok i got it to work on Mac but had to move vocoder and encoder to cpu. MPS lacks support for The operator 'aten::complex.out' is not currently implemented for the MPS device. The operator 'aten::_fft_r2c' is not currently implemented for the MPS device

BBC-Esq commented 9 months ago

Excellent, so we've whittled it down. Can you send a screen shot of trying to put it via mps anyways? That way I can see what the error says and try to troubleshoot. But with my revised scripts (i.e. draft pull request) MPS works for everything except the vocoder? Thanks.

BBC-Esq commented 9 months ago

I was able to find this. https://qqaatw.dev/pytorch-mps-ops-coverage/ I couldn't find fft_r2c on there though.

akorzh commented 9 months ago

sorry i didn't use your pull request, just some hacked together code(which is quite similar but in more places). Need to have something working first i thought. Haven't you tried running on MPS yourself? i posted couple of requests to https://github.com/pytorch/pytorch/issues/77764

BBC-Esq commented 9 months ago

Unfortunately I don't have an Apple computer...nor Linux for that matter. That's an extreme challenge when trying to write code that works with all three platforms for sure. I was able to find these links, however:

https://github.com/pytorch/pytorch/pull/116630 https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_pytorch_operation https://github.com/neuraloperator/neuraloperator

Not sure if they'll help.

My draft pull request has all the basic infrastructure there though, suppose we could modify it to exclude the vocoder from being loaded on MPS alone, but I'd like to hear back from the repository owner if he can confirm that you've said so we know for certain ya know?

jpc commented 9 months ago

I was thinking about writing to the Vocos author since I believe sometimes the offending operations can be changed to something a little bit different that works out of the box on MPS.

BBC-Esq commented 9 months ago

Do it! @akorzh do you have the script you used? Might help me troubleshoot.

BBC-Esq commented 9 months ago

@jpc A few possible workarounds if we can't find a way to get vocos working on MPS out of the box...

1) Manually implement the GEMMs or specific FFT operations using MPS primitives.

2) Decompose the unsupported operations into smaller supported operations.

3) Context Manager to automatically move operations to CPU/MPS when appropriate to ensure that as much as possible will run on MPS.

4) Write custom kernels in the metal shading language and invoke them from python with PyObjC.

5) Evaluate how MPS Graph within Core ML might help.

6) Possibly use SYCL and DPC++ to write code that is portable across different GPU architectures, including potentially targeting Metal through an abstraction layer. Primarily designed for CUDA and OpenCL, could potentially be adapted to generate MSL code that runs on MPS.

7) Using OpenCL/GL instead of MPS as a fallback rather than falling back to the CPU.

Thoughts anyone?

BBC-Esq commented 9 months ago

Another option might be to use Vulkan. Llama.cpp just implemented a Vulkan backend, one version from gpt4All and another from another guy (forget his named). This would also allow gpu acceleration with AMD gpus on Windows and, according to the following link, on MacOS as well:

https://github.com/KhronosGroup/MoltenVK

BBC-Esq commented 9 months ago

https://github.com/KhronosGroup/MoltenVK/issues/2154

BBC-Esq commented 9 months ago

@jpc and @akorzh I think I may have found a solution. MLX for MacOS? Here's the operations it supports:

image

Here's the website link:

https://ml-explore.github.io/mlx/build/html/python/fft.html https://github.com/ml-explore/mlx

Take it with a grain of salt, but here's what gpt-4 says...so there might be an option optimized for apple already...I leave it to your expertise:

image

SEE ALSO HERE FOR MORE DETAIL:

https://ml-explore.github.io/mlx/build/html/python/_autosummary/mlx.core.fft.rfft.html#mlx.core.fft.rfft

gpt-4 says they're the same...also ran through gpt the pytorch description here:

https://pytorch.org/cppdocs/api/function_namespaceat_1aaea819b1367e99c6ef062ac8335edba2.html

signalprime commented 9 months ago

hey crew, I spent a few hours last night and today working on both CPU and MPS updates to this codebase. I also ran into the same results as @akorzh , except that I didn't get it to run. Instead, attempting to keep everything on the CPU I ran into the "addmm_impl_cpu_" not implemented for 'Half' message inside the MultiHeadAttention.forward call. Perhaps it has to do with my environment running pytorch version '2.1.1' at the time of testing.

I spent time with the [spd_kernel](https://github.com/collabora/WhisperSpeech/blob/80b268b74900b2f7ca7a36a3c789607a3f4cd912/whisperspeech/s2a_delar_mup_wds_mlang.py#L500) line without a solution yet. To my understanding pytorch hasn't implemented Flash Attention, but there was an implementation at https://github.com/philipturner/metal-flash-attention.

Moving past that, I think if we can use functions from the Vulkan or the MLX library, like @BBC-Esq pointed out, it would be best. I've not yet worked with these projects yet so a lot is unfamiliar.

akorzh commented 9 months ago

patch.txt here is my patch which works on mac (runs mps and cpu for the rest)

fakerybakery commented 9 months ago

@akorzh nice! any plans for a PR?

signalprime commented 9 months ago

thanks @akorzh , can confirm those updates worked here too

BBC-Esq commented 9 months ago

patch.txt here is my patch which works on mac (runs mps and cpu for the rest)

@akorzh Below is a summary of the locations where you changed lines from cuda to mps. With your permission, I'd like to modify my pull request to add these changes, but making them dynamic. In other words, the new function in utils.py would determine the "compute_device". If "cuda", all of the locations would continue to use "cuda" dynamically. If "mps" is the available compute device, "mps" would be used everywhere except the locations that require CPU; specifically:

A2WAV.PY

self.vocos = Vocos.from_pretrained(repo_id).cuda()

PIPELINE.PY

run_opts={"device": "cuda"})

This would enable dynamic choosing of the appropriate compute device for both CUDA or MPS...and we can also add CPU for all if we want to include that as an option for people...It's my understanding that torch.set_default_device() doesn't accept "cpu" because CPU is default under PyTorch...Anyways, here's the outline of the lines I'd focus on:

benchmark.py
    Original: - torch.cuda.synchronize()
    Modified: + torch.mps.synchronize()
    Original: - pipe.t2s.decoder.mask = torch.empty(t2s_ctx_n, t2s_ctx_n).fill_(-torch.inf).triu_(1).cuda()
    Modified: + pipe.t2s.decoder.mask = torch.empty(t2s_ctx_n, t2s_ctx_n).fill_(-torch.inf).triu_(1).to("mps")
    Original: - pipe.s2a.decoder.mask = torch.empty(s2a_ctx_n, s2a_ctx_n).fill_(-torch.inf).triu_(1).cuda()
    Modified: + pipe.s2a.decoder.mask = torch.empty(s2a_ctx_n, s2a_ctx_n).fill_(-torch.inf).triu_(1).to("mps")

extract_acoustic.py
    Original: - return _tform(x).cuda().unsqueeze(0)
    Modified: + return _tform(x).to("mps").unsqueeze(0)
    Original: - model.cuda().eval();
    Modified: + model.to("mps").eval();

extract_spk_emb.py
    Original: - run_opts={"device": "cuda"})
    Modified: + run_opts={"device": "mps"})

extract_stoks.py
    Original: - vq_model = vq_stoks.RQBottleneckTransformer.load_model(vq_model).cuda()
    Modified: + vq_model = vq_stoks.RQBottleneckTransformer.load_model(vq_model).to("mps")
    Original: - run_opts={"device": "cuda"})
    Modified: + run_opts={"device": "mps"})
    Original: - samples16k = samples16k.cuda().to(torch.float16)
    Modified: + samples16k = samples16k.to("mps").to(torch.float16)

pipeline.py
    Original: - self.t2s = TSARTransformer.load_model(**args).cuda()
    Modified: + self.t2s = TSARTransformer.load_model(**args).to("mps")
    Original: - self.s2a = SADelARTransformer.load_model(**args).cuda()
    Modified: + self.s2a = SADelARTransformer.load_model(**args).to("mps")

prepare_s2a_atoks.py
    Original: - csamples = samples.cuda().unsqueeze(1)
    Modified: + csamples = samples.to("mps").unsqueeze(1)

prepare_t2s_txts.py
    Original: - model_size, "cuda", compute_type="float16", language=lang,
    Modified: + model_size, "mps", compute_type="float16", language=lang,
    Original: - csamples = samples.cuda()
    Modified: + csamples = samples.to("mps")

s2a_delar_mup_wds_mlang.py
    Original: - self.register_buffer('val_true', torch.zeros(self.quantizers).cuda())
    Modified: + self.register_buffer('val_true', torch.zeros(self.quantizers).to("mps"))
    Original: - self.register_buffer('val_total', torch.zeros(self.quantizers).cuda())
    Modified: + self.register_buffer('val_total', torch.zeros(self.quantizers).to("mps"))
    Original: - spec = torch.load(local_filename)
    Modified: + spec = torch.load(local_filename,map_location='mps')

t2s_up_wds_mlang_enclm.py
    Original: - spec = torch.load(local_filename)
    Modified: + spec = torch.load(local_filename,map_location='mps')

vad.py
    Original: - vad_model = whisperx.vad.load_vad_model('cuda')
    Modified: + vad_model = whisperx.vad.load_vad_model('mps')

vq_stoks.py
    Original: - self.register_buffer('val_true', torch.zeros(1).cuda())
    Modified: + self.register_buffer('val_true', torch.zeros(1).to("mps"))
    Original: - self.register_buffer('val_total', torch.zeros(1).cuda())
    Modified: + self.register_buffer('val_total', torch.zeros(1).to("mps"))

wh_transcribe.py
    Original: - embs = whmodel.encoder(whisper.log_mel_spectrogram(samples).cuda())
    Modified: + embs = whmodel.encoder(whisper.log_mel_spectrogram(samples).to("mps"))
akorzh commented 9 months ago

oh yeah totally i dont mind and since i did this as a hack just to see if it runs in the end or not i was not making it nice to be a MR, so you are welcome to use any of it for the good of community

BBC-Esq commented 9 months ago

@signalprime

Just FYI, MLX currently shows some regression on M3 chips, but that's likely due to how new it is and it's constantly improving so I would be shocked if it's not better than MPS on any and all silicone in the very near future. Also, it requires a model to be converted to MLX so… If we were to implement it, here's some info will see how this exciting technology develops.

https://towardsdatascience.com/how-fast-is-mlx-a-comprehensive-benchmark-on-8-apple-silicon-chips-and-4-cuda-gpus-378a0ae356a0

BBC-Esq commented 9 months ago

I marked the pull request at ready for review: https://github.com/collabora/WhisperSpeech/pull/77

If one or two people who tested mps previously could test the pull request that might help out. @jpc speed review as well.

Also, I'm open to learning jupyter notebooks as @jpc offered, but please don't make me redo this pull request using them...will try to use them in the future if it helps people on here. ;-)

signalprime commented 9 months ago

I'm just seeing this, seems like it's been pushed already. I'll keep my eyes open in case I can help out

fakerybakery commented 9 months ago

Since #89 (successor to #77) is merged, I'm going to close this issue now

cocktailpeanut commented 9 months ago

Has this been tested? I just tried it and I had to change some parts to get it to launch. https://github.com/collabora/WhisperSpeech/pull/92

Even after these changes, it fails with a RuntimeError: Placeholder storage has not been allocated on MPS device! on an MPS device. Maybe I'm missing something.

For the record, I tested as if I would be using the package in a regular application:

  1. Have a separate python app that depends on whisperspeech.
  2. Instead of just installing whisperspeech from pip, I Installed directly from the forked git repo
  3. Ran the app.

Has anyone tested? Let me know if anyone got the current MAIN branch to work on their MPS machine.

BBC-Esq commented 9 months ago

To clarify, did you get it to work just like before the major pull request that I did (based on others' insights abut models/tensors) after the modifications in your pull request...or was there still something lacking? Unfortunately, I don't have MacOS to test things, which is why I asked for 2 testers...Glad someone did.

Do you have any log or print statements you can share, personal info redacted if you choose of course?

cocktailpeanut commented 9 months ago

I never touched the codebase until today. I only checked back when I heard from someone that this is now implemented and merged, so I never got to try the code before. Even if I did, MPS wasn't working anyway so couldn't have tested.

Before the changes I made in my PR, the app that was using whisperspeech was failing whenever it tried to use the utils.py module.

Also, I feel like this is not supposed to work (unless you run this in a dev environment) unless we move the webdataset from dev_requirements to requirements https://github.com/collabora/WhisperSpeech/pull/92/files#diff-f247846b0ab6c196467cd7e8e41027c7f27eb2b74b2e9a33d54c59a6fe9f00b0R39 (It wasn't working, which is how I found out).

Here's the full error log (app.py is the app that uses whisperspeech)

$ source /Users/x/pinokio/api/whisperspeech/env/bin/activate /Users/x/pinokio/api/whisperspeech/env && python app.py
/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
['  This is the first demo of Whisper Speech, a fully open source text-to-speech model trained by Collabora and Lion on the Juwels supercomputer.  '] ['en']
/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
Traceback (most recent call last):████████████████████████████████████████████████████████████████████| 100.00% [752/752 00:58<00:00]
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/app.py", line 47, in whisper_speech_demo
    audio = generate_audio(pipe, segments, speaker_audio, speaker_url, cps)
  File "/Users/x/pinokio/api/whisperspeech/app.py", line 38, in generate_audio
    audio = pipe.vocoder.decode(atoks)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/whisperspeech/a2wav.py", line 42, in decode
    return self.vocos.decode(features, bandwidth_id=bandwidth_id)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/vocos/pretrained.py", line 112, in decode
    x = self.backbone(features_input, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/vocos/models.py", line 82, in forward
    x = self.norm(x.transpose(1, 2), cond_embedding_id=bandwidth_id)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/vocos/modules.py", line 82, in forward
    scale = self.scale(cond_embedding_id)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
  File "/Users/x/pinokio/api/whisperspeech/env/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
BBC-Esq commented 9 months ago

Thanks, I'll try to take a look and do some research this weekend, especially because this was primarily my pull request. In my own defense, however, I was basing it off of what other people said about MPS support certain devices...and additionally I don't have an Apple computer so it's very difficult to troubleshoot since I can't test at all.

Do me a favor, run the prior code base before the recent pull request and let me know if you get any errors with it. Again, I don't have MacOS, but if I recall it's supposed to fallback to using cpu for everything.

Can you also verify which version of PyTorch and other libraries you have pip installed? I'll do my best. Thanks!

cocktailpeanut commented 9 months ago

@BBC-Esq wouldn't prior code before your PR NOT have MPS support therefore won't run on my Mac?

I've tried both the default Mac installation and the nightly one. Hope this helps.

pip3 install --pre torch torchvision torchaudio
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

if I recall it's supposed to fallback to using cpu for everything.

Wait, does this mean this is technically a "CPU support" and not "MPS support"? I did see mentions of MPS in the new code so I just assumed this made use of MPS. Doesn't it?

BBC-Esq commented 9 months ago

You are correct. I thought I had edited my message before clicking "Comment"...guess I didn't. Yes, the pre-pull-request version did not support MPS (or attempt to support MPS). You'd have to doublecheck the code to see if it even supported "cpu" if "cuda" wasn't available because I only ever used "cuda," but I don't think that even cpu was used as a "fallback." Sorry for the confusion.

Which version of python are you running? I will send you the link to the pytorch wheels to pip install just to make sure it's using the right one. I personally recently had issues with pytorch's index of wheels not giving the right one so they're on my naughty list...

Here is an example, but I'll find the correct wheels for torch, torchaudio, and torchvision all for you: image

BBC-Esq commented 9 months ago

Wait, does this mean this is technically a "CPU support" and not "MPS support"? I did see mentions of MPS in the new code so I just assumed this made use of MPS. Doesn't it?

"mps" and "cpu" are different devices in pytorch terminology, and the pull request is supposed to support both compute devices based on a user's setup. If a user has "mps," it should use mps for everything except "vocoder" and "encoder," which should be apparent by looking at the pull request. If a user doesn't have "cuda" or "mps" everything should be placed on "cpu." Sorry for the mundane explanation but hope that clarifies...

MPS is much faster than cpu, but far behind "cuda," but at least it's an improvement for MacOS users...

Also, I noticed that WhisperSpeech automatically installs the three pytorch libraries through one of its dependencies "Speechbrain", which, on my system (again I use CUDA), it installed the CPU version. I had to manually uninstall torch, torchvision, and torchaudio and run the proper pip install commands.

In theory, pip should overwrite the older version when you explicitly pip install another version, but after I get you the specific wheels please pip uninstall those three libraries first...then pip install the 3 wheels I give you. Then I can help you as much as I can...At least we will have a baseline for any troubleshooting steps involving print statements and the like...

Thanks again!

cocktailpeanut commented 9 months ago

Also, I noticed that WhisperSpeech automatically installs the three pytorch libraries through one of its dependencies "Speechbrain", which, on my system (again I use CUDA), it installed the CPU version. I had to manually uninstall torch, torchvision, and torchaudio and run the proper pip install commands.

Yes I do exactly that, and have confirmed the correct versions are installed. Also I install into venv so everything is isolated in the venv. The venv is python 3.10.

I know I might be missing something since I didn't read through all the code, but just based on the error message doesn't it look like it's not from the torch install but from the code not explicitly applying MPS somewhere? I am very familiar with MPS errors when they fail to run because of the torch version mismatch, but I've never seen this kind of error coming from torch version mismatch.

Also, one important clarification. Can you take a look at this line https://github.com/collabora/WhisperSpeech/pull/89/files#diff-ba9e2bb34cdd77f3f053d7980195bbddbbb742c3c9311ccc5427ccaaf4fc785aR72 and let me know if this change is wrong? Because I am running this code base right now (without this change it won't even run saying Vocoder doesn't have .to() method). I am going to assume you had a reason to add that to the code, and since I am operating with code that doesn't have that, maybe that's causing the problem.

BBC-Esq commented 9 months ago

I will look at that specific ".to" issue next, but for my own sanity can you please try pip uninstalling all three, and then pip installing these three wheels first? That way I can rule it out in my own mind at least...helps me...

pip install https://download.pytorch.org/whl/cpu/torch-2.1.2-cp310-none-macosx_10_9_x86_64.whl#sha256=d9b535cad0df3d13997dbe8bd68ac33e0e3ae5377639c9881948e40794a61403
pip install https://download.pytorch.org/whl/cpu/torchaudio-2.1.2-cp310-cp310-macosx_10_13_x86_64.whl#sha256=06f8c02814e6cdd78626bbf44ad2bb8afa5b39ab650c6af18328a32311461058
pip install https://download.pytorch.org/whl/cpu/torchvision-0.16.2-cp310-cp310-macosx_10_13_x86_64.whl#sha256=bc86f2800cb2c0c1a09c581409cdd6bff66e62f103dc83fc63f73346264c3756

Also, we'll be testing pytorch 2.1.2 not the latest 2.2.2, i.e. the version that I've tested my system on and I believe another macos user has...

BBC-Esq commented 9 months ago

Maybe @akorzh can chime in, but if he doesn't, he reportedly got it working on MPS with the things that needed cpu, moved to cpu instead of mps... image And the goal of my pull request was to merely reflect the changes he made in his modification, the only difference being to "dynamically" choose the appropriate device based on a user's system so...I feel we're close.

signalprime commented 9 months ago

hey team, I manually applied the changes and confirmed it works, however not all operations are performed on the "mps" device due to a lack of compatibility with torch. I've tested it with torch version '2.1.1'.

I will review the PR in discussion and edit this message afterwards. EDIT: there were numerous small updates required, therefore I created a new PR with the updates along with the appropriate credits to @akorzh and @BBC-Esq for their valuable contributions

BBC-Esq commented 9 months ago

@jpc I think it's save to close this now? If we decide to revise the MacOS issue later I'm assuming we can open another issue regarding MLX or Vulkan or what not...