7900 XTX: Error invalid device function at line 679 in file /bitsandbytes/csrc/ops.hip

PatchouliPatch commented 3 months ago

System Info

Kernel: 6.5.0-28-generic

Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

GPU: Sapphire Pulse RX 7900 XTX ROCm Version: 6.0.2 CPU: Ryzen 7 7700X Motherboard: Gigabyte Aorus Elite AX B650 (BIOS: F24c) Torch version: torch==2.3.0+rocm6.0 Python version: 3.10.14

Reproduction

I'm on the rocm_enabled branch. Attempting to compile the ROCm 6.2 testing branch results in errors. Running the following code results in this error:

# Huggingface Transformers

model_id = "microsoft/Phi-3-mini-128k-instruct"
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    #load_in_4bit=True,
    #bnb_4bit_quant_type="nf4",
    #bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code = True,
    attn_implementation="eager",
    quantization_config = bnb_config # un-comment to quantize your model; Only supports Nvidia GPUs
)

attached here is ops.hip: ops.hip.zip

Expected behavior

after running that piece of code, I get the following error:

Error invalid device function at line 679 in file /home/$USER/bitsandbytes/csrc/ops.hip.

Nothing else prints to my terminal.

pnunna93 commented 3 months ago

Hi @PatchouliPatch , I need more details to review this. Could you please run the script with AMD_LOG_LEVEL=3 and share its output?

AMD_LOG_LEVEL=3 HIP_VISIBLE_DEVICES=0 python3 check_for_possibility.py

Please also share outputs of 'rocminfo' and 'hipconfig --version'

PatchouliPatch commented 3 months ago

Here's the terminal output with AMD_LOG_LEVEL: log_level3.txt

rocminfo: rocminfo.txt

hipconfig --version: 6.0.32831-204d35d16

I know that we were advised to disable the iGPU on the CPU, but for some reason Gigabyte's BIOS fails to do so even if I tell it to disable

pnunna93 commented 3 months ago

Could you try with rocm 6.1? You can use rocm/pytorch:latest docker.

If you have to use 6.0, please try with rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2

Please make sure to select gfx1100 gpu with the container.

PatchouliPatch commented 3 months ago

Alright, will try it out. I built my previous one with gfx1101. What's the Navi 31 GFX name supposed to be anyways? Is it 1100?

PatchouliPatch commented 3 months ago

Gave it a try today.

I installed the latest ROCm version of 6.1.1 after uninstalling 6.0.2.

I pulled the latest version of the repo and did the following:

git checkout rocm_enabled
git pull
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S . -DBNB_ROCM_ARCH="gfx1100" -DCMAKE_HIP_COMPILER=/opt/rocm-6.1.1/llvm/bin/clang++
make
pip install .

the program compiled but gave me warnings.

after rerunning it with the same python script, it seems to still give the same errors.

here's the output when I run AMD_LOG_LEVEL=3 now: log_level3_new.txt

and here's rocminfo: rocminfo_new.txt

hipconfig version: 6.1.40092-038397aaa

pnunna93 commented 3 months ago

Please set HSA_OVERRIDE_GFX_VERSION=11.0.0 and retry. Its an environment variable, you can export or set it while running the script. It will target to gfx1100 architecture.

vasicvuk commented 2 months ago

I had the same error, adding HSA_OVERRIDE_GFX_VERSION=11.0.0 seems to fix it but unforcenetly now i get:

rocblaslt warning: No paths matched /opt/rocm/lib/hipblaslt/library/*gfx1100*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.
A: torch.Size([1984, 3200]), B: torch.Size([3200, 3200]), C: (1984, 3200); (lda, ldb, ldc): (c_int(1984), c_int(3200), c_int(1984)); (m, n, k): (c_int(1984), c_int(3200), c_int(3200))
error detectedTraceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 70, in <module>
    fire.Fire(do_cli)
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 66, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/opt/Projects/axolotl/src/axolotl/train.py", line 170, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/Projects/axolotl/src/axolotl/core/trainer_builder.py", line 539, in compute_loss
    return super().compute_loss(model, inputs, return_outputs=return_outputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward
    return self.base_model(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
    return self.model.forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
    outputs = self.model(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 814, in llama_model_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 808, in custom_forward
    return module(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 907, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 422, in flashattn_forward
    query_states = self.q_proj(hidden_states)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 217, in forward
    result = self.base_layer(x, *args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 801, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 559, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 398, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2388, in igemmlt
    raise Exception("cublasLt ran into an error!")
Exception: cublasLt ran into an error!

PatchouliPatch commented 2 months ago

I suggest moving over to the alpha test of the actual Bitsandbytes library. You can use the multi_backend_refactor branch. It works on my 7900 XTX

farshadghodsian commented 2 months ago

I must of missed this issue as I opened a seperate issue to report a similar issue on my Radeon Pro W7900 (also gfx1100) with loading the model in 8-bit. It should be noted that while trying to load a model in 8-bit is not working with bitsandbytes on Radeon GPUs I did get it to work loading models in 4-bit. Not sure if this will help your use-case but using load_in_4bit=True instead of load_in_8bit=True worked for me.

Note that as of newer versions of PyTorch there is an upstream issue with PyTorch force loading HIPBLASLT for all AMD GPUS which is not supported on Radeon GPUs. You will also need to set TORCH_BLAS_PREFER_HIPBLASLT=0 for it to work.

ROCm / bitsandbytes