UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.42k stars 2.5k forks source link

[BUG] ROCm/WSL "HIP error: no kernel image is available for execution on the device" #2903

Open unclemusclez opened 3 months ago

unclemusclez commented 3 months ago

attempting to use autotrainer-advanced with pytoch 2.4.0 ROCm on WSL2 Ubuntu 22.04 Windows 11

INFO     | 2024-08-22 02:25:11 | __main__:train:203 - Setting up trainer...
When using the Trainer, CodeCarbonCallback requires the `codecarbon` package, which is not compatible with AMD ROCm (https://github.com/mlco2/codecarbon/pull/490). Automatically disabling the codecarbon callback. Reference: https://huggingface.co/docs/transformers/v4.39.3/en/main_classes/trainer#transformers.TrainingArguments.report_to.
The dataset `id` 'skratos115/opendevin_data_devinator' does not exist on the Hub. Setting the `id` to None.
INFO     | 2024-08-22 02:25:14 | __main__:train:212 - Starting training...
[2024-08-22 02:25:14,363] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @autocast_custom_fwd
/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @autocast_custom_bwd
INFO     | 2024-08-22 02:25:15 | autotrain.trainers.common:on_train_begin:230 - Starting to train...

  0%|          | 0/28704 [00:00<?, ?it/s]ERROR    | 2024-08-22 02:25:32 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/autotrain/trainers/common.py", line 117, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/autotrain/trainers/sent_transformers/__main__.py", line 213, in train
    trainer.train()
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 1955, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 2296, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3380, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/sentence_transformers/trainer.py", line 329, in compute_loss
    loss = loss_fn(features, labels)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/sentence_transformers/losses/CoSENTLoss.py", line 79, in forward
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/sentence_transformers/losses/CoSENTLoss.py", line 79, in <listcomp>
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/container.py", line 219, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/sentence_transformers/models/Pooling.py", line 153, in forward
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: no kernel image is available for execution on the device
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

ERROR    | 2024-08-22 02:25:32 | autotrain.trainers.common:wrapper:121 - HIP error: no kernel image is available for execution on the device
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

terminate called after throwing an instance of 'c10::Error'
  what():  invalid device pointer: 0x5400000
Exception raised from free at ../c10/hip/HIPCachingAllocator.cpp:2994 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f2293e29096 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f2293dd7de0 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1d15e (0x7f22d99fe15e in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10_hip.so)
frame #3: <unknown function> + 0x5de6d0 (0x7f2364e826d0 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x6bcef (0x7f2293e0ccef in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #5: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f2293e05d4b in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f2293e05ef9 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #7: torch::autograd::SavedVariable::reset_data() + 0xec (0x7f23513b369c in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x46714d1 (0x7f23506ee4d1 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x52f7812 (0x7f2351374812 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::deleteNode(torch::autograd::Node*) + 0x7f (0x7f23513748af in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #11: std::_Sp_counted_deleter<torch::autograd::generated::MulBackward0*, void (*)(torch::autograd::Node*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0xe (0x7f235082585e in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x52d7db0 (0x7f2351354db0 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #13: c10::TensorImpl::~TensorImpl() + 0x212 (0x7f2293e05d42 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #14: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f2293e05ef9 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #15: <unknown function> + 0x89a5f8 (0x7f236513e5f8 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #16: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f236513e946 in /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/lib/libtorch_python.so)
frame #17: /home/musclez/ComfyUI/.venv/bin/python() [0x4e4efe]
frame #18: /home/musclez/ComfyUI/.venv/bin/python() [0x4d394d]
frame #19: /home/musclez/ComfyUI/.venv/bin/python() [0x53dd0e]
frame #20: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc79]
frame #21: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc5e]
frame #22: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc5e]
frame #23: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc5e]
frame #24: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc5e]
frame #25: /home/musclez/ComfyUI/.venv/bin/python() [0x53bc5e]
frame #26: /home/musclez/ComfyUI/.venv/bin/python() [0x5508da]
frame #27: _PyEval_EvalFrameDefault + 0x7df9 (0x502659 in /home/musclez/ComfyUI/.venv/bin/python)
frame #28: /home/musclez/ComfyUI/.venv/bin/python() [0x62e1b4]
frame #29: PyEval_EvalCode + 0x97 (0x4f3a67 in /home/musclez/ComfyUI/.venv/bin/python)
frame #30: /home/musclez/ComfyUI/.venv/bin/python() [0x569033]
frame #31: /home/musclez/ComfyUI/.venv/bin/python() [0x50d977]
frame #32: PyObject_Vectorcall + 0x35 (0x50d745 in /home/musclez/ComfyUI/.venv/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x8f2 (0x4fb152 in /home/musclez/ComfyUI/.venv/bin/python)
frame #34: _PyFunction_Vectorcall + 0x173 (0x531823 in /home/musclez/ComfyUI/.venv/bin/python)
frame #35: /home/musclez/ComfyUI/.venv/bin/python() [0x64fd94]
frame #36: Py_RunMain + 0x142 (0x64f5a2 in /home/musclez/ComfyUI/.venv/bin/python)
frame #37: Py_BytesMain + 0x2d (0x61ee0d in /home/musclez/ComfyUI/.venv/bin/python)
frame #38: <unknown function> + 0x29d90 (0x7f2366e17d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: __libc_start_main + 0x80 (0x7f2366e17e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #40: _start + 0x25 (0x61ec95 in /home/musclez/ComfyUI/.venv/bin/python)

Traceback (most recent call last):
  File "/home/musclez/ComfyUI/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/musclez/ComfyUI/.venv/bin/python', '-m', 'autotrain.trainers.sent_transformers', '--training_config', 'gemma-2-2b-it-oh-bf16-8192-hip/training_params.json']' died with <Signals.SIGABRT: 6>.
tomaarsen commented 3 months ago

Hello!

I'm not very familiar with AMD GPUs and their kernel images, etc. However, the operation on which you're failing (torch.sum(token_embeddings * input_mask_expanded, 1)) is rather simple, so this strikes me as 1) an incompatibility of some kind between your softwares (e.g. WSL2 on Windows with the ROCm image) or 2) an installation issue.

If you ran a very simple torch script with your AMD GPU, does that one work correctly? E.g.

import torch

device = torch.device("cuda")
matrix = torch.randn(3, 2, device=device) @ torch.randn(2, 5, device=device)
sum = matrix.sum()
print(sum)
# => tensor(3.7824, device='cuda:0')
unclemusclez commented 3 months ago
$ HIP_VISIBLE_DEVICES=0 python test.py
/home/musclez/test.py:4: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at ../aten/src/ATen/Context.cpp:288.)
  matrix = torch.randn(3, 2, device=device) @ torch.randn(2, 5, device=device)
tensor(3.5258, device='cuda:0')

i think we are getting warmer. This may work fine on MI300 CDNA3 architecture, which i believe would be the priority of AMD. I've had issues in the past with hipBLASLt working with gfx1100 (Navi 31, Radeon 7900)

unclemusclez commented 3 months ago

with "device = torch.device("hip")"

import torch

device = torch.device("hip")
matrix = torch.randn(3, 2, device=device) @ torch.randn(2, 5, device=device)
sum = matrix.sum()
print(sum)
# => tensor(3.7824, device='cuda:0')
$HIP_VISIBLE_DEVICES=0 python test.py
Traceback (most recent call last):
  File "/home/musclez/test.py", line 4, in <module>
    matrix = torch.randn(3, 2, device=device) @ torch.randn(2, 5, device=device)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'HIP' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, CUDA, Meta, QuantizedCPU, QuantizedCUDA, QuantizedMeta, MkldnnCPU, SparseCPU, SparseCUDA, SparseMeta, SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at aten/src/ATen/RegisterCPU.cpp:30476 [kernel]
CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:44679 [kernel]
Meta: registered at aten/src/ATen/RegisterMeta.cpp:26996 [kernel]
QuantizedCPU: registered at aten/src/ATen/RegisterQuantizedCPU.cpp:954 [kernel]
QuantizedCUDA: registered at aten/src/ATen/RegisterQuantizedCUDA.cpp:462 [kernel]
QuantizedMeta: registered at aten/src/ATen/RegisterQuantizedMeta.cpp:108 [kernel]
MkldnnCPU: registered at aten/src/ATen/RegisterMkldnnCPU.cpp:534 [kernel]
SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:1406 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1576 [kernel]
SparseMeta: registered at aten/src/ATen/RegisterSparseMeta.cpp:290 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:1154 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:1279 [kernel]
SparseCsrMeta: registered at aten/src/ATen/RegisterSparseCsrMeta.cpp:1068 [kernel]
BackendSelect: registered at aten/src/ATen/RegisterBackendSelect.cpp:792 [kernel]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:497 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:349 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at ../aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at ../aten/src/ATen/native/NegateFallback.cpp:22 [kernel]
ZeroTensor: fallthrough registered at ../aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:96 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradHIP: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradMPS: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradIPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradVE: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradMTIA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradMeta: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:19981 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:17715 [kernel]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:209 [backend fallback]
AutocastXPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:351 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:165 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:731 [backend fallback]
BatchedNestedTensor: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:758 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:27 [backend fallback]
Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:207 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:493 [backend fallback]
PreDispatch: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]