Open n8henrie opened 1 year ago
Very cool! cc maintainers @tscholak @teh @thoughtpolice. Also cc @samuela.
curious if python3Packages.torch-bin works?
@samuela torch-bin gives me unsupported system (cuda_nvtx-11.7.50
)
hmmm that's odd. aarch64-darwin is a supported platform
and there are srcs available here:
what version of python are you using?
I made a nix
and nix-gpu
branch at https://github.com/n8henrie/whisper; nix-gpu
uses the wheel (and I've confirmed it enables the GPU to run whisper).
However I'm not seeing any speed difference with a small test file (full of particularly witty content):
$ nix shell github:n8henrie/whisper/nix -c time whisper 20220922\ 084923.m4a
/nix/store/0fyl0qi21ljiswg05qz5p2bpnl016k0l-python3.10-whisper/lib/python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:04.640] This is an example of voice recording. Something might actually say if I
[00:04.640 --> 00:09.440] recording something on my watch. I think this is how it would turn out.
19.93user 11.08system 0:14.23elapsed 217%CPU (0avgtext+0avgdata 2062800maxresident)k
0inputs+0outputs (3246major+212332minor)pagefaults 0swaps
$
$ nix shell github:n8henrie/whisper/nix-gpu -c time whisper 20220922\ 084923.m4a
/nix/store/fvpkcahvngmsssm3yfchvgw11hdic6sx-python3.10-whisper/lib/python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:04.640] This is an example of voice recording. Something might actually say if I
[00:04.640 --> 00:09.440] recording something on my watch. I think this is how it would turn out.
19.81user 11.16system 0:14.02elapsed 220%CPU (0avgtext+0avgdata 2177600maxresident)k
0inputs+0outputs (3955major+212007minor)pagefaults 0swaps
Not sure if that's going to be a training vs inference issue (or more likely whisper isn't configured to use the GPU, it has a --device
flag but seems to only look for cuda devices by default).
Huh, looks like I just need to pass --device mps
(https://github.com/openai/whisper/pull/382) but no luck:
$ nix shell github:n8henrie/whisper/nix-gpu -c time whisper --device mps 20220922\ 084923.m4a
Traceback (most recent call last):
File "/nix/store/fvpkcahvngmsssm3yfchvgw11hdic6sx-python3.10-whisper/bin/.whisper-wrapped", line 9, in <module>
sys.exit(cli())
File "/nix/store/fvpkcahvngmsssm3yfchvgw11hdic6sx-python3.10-whisper/lib/python3.10/site-packages/whisper/transcribe.py", line 444, in cli
model = load_model(model_name, device=device, download_root=model_dir)
File "/nix/store/fvpkcahvngmsssm3yfchvgw11hdic6sx-python3.10-whisper/lib/python3.10/site-packages/whisper/__init__.py", line 154, in load_model
return model.to(device)
File "/nix/store/0y2cidaighjvzwlw31k6fkcragba3jki-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/nix/store/0y2cidaighjvzwlw31k6fkcragba3jki-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 844, in _apply
self._buffers[key] = fn(buf)
File "/nix/store/0y2cidaighjvzwlw31k6fkcragba3jki-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31034 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:22748 [kernel]
Meta: registered at /dev/null:241 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:929 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:507 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1379 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1128 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:726 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17484 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:16726 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
Command exited with non-zero status 1
3.87user 1.23system 0:03.39elapsed 150%CPU (0avgtext+0avgdata 1210224maxresident)k
0inputs+0outputs (19major+112713minor)pagefaults 0swaps
timing based benchmarks are tricky to debug... does import torch; print(torch.backends.mps.is_available())
work for you in this flake? also ooc what's the equivalent to nvtop
for the MPS backend?
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
Yeah this error looks like it's an issue with whisper as opposed to nix's packaging of torch/torch-bin.
We can leave this open as a tracking issue for getting MPS support in our source build of torch however.
what version of python are you using?
Tried 39-311. I'm pinned to 23.05, I wonder if it's different in unstable?
Yup, that was it!
$ nix-shell -I nixpkgs=channel:nixpkgs-unstable -p 'python39.withPackages (ps: with ps; [ pytorch-bin ])' --command "python -c 'import torch; print(torch.backends.mps.is_available())'"
True
import torch; print(torch.backends.mps.is_available())
@samuela yes, I just patched this into the whisper code (in the cli()
function) and confirmed my nix
branch shows False
and my nix-gpu
branch shows the device as expected.
yup, that makes for two TODOs
python3Packages.torch
I found that torchvision-bin support on aarch64-darwin effectively prevented MPS from being usable. Submitted https://github.com/NixOS/nixpkgs/pull/244716 to fix.
I can confirm that MPS support works with torch-bin and torchvision-bin now
I'm still trying to get it working. Tried adding the USE_MPS
flag but it still results without MPS support; I think the build script uses xcrun
to sort out some SDK path business, which fails to find the MPSGraph framework, and disables MPS:
bash: line 1: xcrun: command not found
-- MPS: unable to get MacOS sdk version
-- MPSGraph framework not found
https://github.com/pytorch/pytorch/blob/1da41157028ee8224e456f6fab18bc22fa2637fe/CMakeLists.txt#L99
Can any of the aarch64-darwin folks help me build this locally via nix develop
?
Currently builds fine with nix build --offline
(so I don't think it's a substituter issue) and nix shell
.
Steps I'm taking / have tried:
nix develop .#python310Packages.pytorch
source $stdenv/setup
genericBuild
It looks like /usr/bin/xcrun
bleeds into the environment, sets USE_MPS
to true, and it fails to find the relevant SDK libraries (fixing this would hopefully help me close this issue, but I'd like to at least get it to build first).
nix develop -i .#python310Packages.pytorch --command bash --norc
[link]source $stdenv/setup
genericBuild
USE_MPS
is set to false, but the build fails looking for #include <google/protobuf/implicit_weak_message.h>
After reading this issue, it looks like some of these are functions and some are variables, so I might need to e.g. eval "${buildPhase}"
instead of just buildPhase
.
nix develop -i .#python310Packages.pytorch --command bash --norc
source $stdenv/setup
unpackPhase
eval $patchPhase
eval $preConfigurePhases
eval $configurePhase
cd source
eval $buildPhase
Here it fails with the same errors about protobuf.
I think these are relevant convos (for my future reference):
Can anyone point out what I'm getting wrong here? nix develop
should be able to build it if nix build
can, right? TIA for any help for a noob.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/nix-develop-fails-with-command-bash-norc/31896/1
According to the pytorch cmakelists, it requires SDK >= 12.3 for MPS support, currently we only have:
$ nix eval --json --apply 'builtins.attrNames' github:nixos/nixpkgs/master#darwin | jq -r '.[] | select(contains("sdk"))'
apple_sdk
apple_sdk_10_12
apple_sdk_11_0
I've tried anyways, adding the MetalPerformanceShaders and MetalPerformanceShadersGraph to the build inputs and substituting xrun
for xcbuild.xcrun
with no luck. I suspect we might just be blocked on a newer SDK?
hmmm interesting... what's the process for adding a new apple_sdk
version? cc @NixOS/darwin-maintainers
@samuela It's a long task with side-quests. See https://github.com/NixOS/nixpkgs/issues/242666 for some discussion - MoltenVK is also dependent on SDK 12. The work is ongoing on various aspects and the SDKs themselves have been proposed here: https://github.com/NixOS/nixpkgs/pull/229210
People looking for a work-around might find this flake helpful. It's based on @n8henrie's wheel approach.
nix run "github:david-r-cox/pytorch-darwin-env#verificationScript"
Project description
Not sure if this is a "packaging request" but didn't seem like a bug report, just wanted a tracking issue, I might try to hack on this eventually.
Pytorch on aarch64-darwin now supports GPU:
https://pytorch.org/docs/stable/notes/mps.html
The currently packaged pytorch runs but doesn't use my Mac's GPU:
In contrast, using the pytorch-provided wheel:
Metadata
cc @NixOS/darwin-maintainers