Open NanoCode012 opened 1 year ago
I have the same issue - it occurs when running an 8bit model in the following docker container
FROM nvidia/cuda:11.7.0-cudnn8-devel-ubuntu22.04
RUN apt update
RUN apt install git -y
RUN apt install wget -y
RUN apt install python3 python3-pip -y
# Install dependencies (one-by-one for better caching)
#RUN pip install --upgrade pip
RUN pip install torch
RUN pip install transformers
RUN pip install datasets
RUN pip install evaluate
RUN pip install xformers
RUN pip install wandb
RUN pip install peft
RUN pip install trl
RUN pip install scipy
RUN pip install accelerate
RUN pip install scikit-learn
RUN pip install pandas
RUN pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt
RUN git clone https://github.com/EleutherAI/lm-evaluation-harness
RUN pip install -e lm-evaluation-harness
RUN git clone https://github.com/timdettmers/bitsandbytes.git
# CUDA_VERSIONS in {110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 120}
# make argument in {cuda110, cuda11x, cuda12x}
# if you do not know what CUDA you have, try looking at the output of: python -m bitsandbytes
ENV CUDA_VERSION=117
RUN cd bitsandbytes && git checkout ac5550a0238286377ee3f58a85aeba1c40493e17
RUN cd bitsandbytes && make cuda11x
RUN cd bitsandbytes && python3 setup.py install
#RUN pip install bitsandbytes
#RUN python3 check_bnb_install.py
# Init wandb
#COPY ./wandb /wandb
ENV WANDB_CONFIG_DIR=/wandb
ENV HF_DATASETS_CACHE="/hf_cache/datasets"
ENV HUGGINGFACE_HUB_CACHE="/hf_cache/hub"
# Copy the code
COPY . /code
# Set the working directory
WORKDIR /code
# Install a useful helper to check bitsandbytes installation. Only works at runtime.
RUN wget https://gist.githubusercontent.com/TimDettmers/1f5188c6ee6ed69d211b7fe4e381e713/raw/4d17c3d09ccdb57e9ab7eca0171f2ace6e4d2858/check_bnb_install.py
+1ing this. I notice it with local conda on H100 lambdalabs. Although I'm unsure whether this is a bitsandbytes error or something to do with CUDA for the H100s.
+1
This is the same error as #533. The problem was that I forgot to compile CUDA 11.8 for sm_90, which are H100 GPUs. The error message basically says that the code is not compiled for your GPU. I will fix this soon. Please continue the discussion in issue #533 until I have fixed this issue.
Trying to run today on a H100 instance, confirmed installation of 0.40.1 that I saw that was supposed to work now with this GPU, I still get:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-23-3435b262f1ae> in <module>
----> 1 trainer.train()
~/.local/lib/python3.8/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1643 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1644 )
-> 1645 return inner_training_loop(
1646 args=args,
1647 resume_from_checkpoint=resume_from_checkpoint,
~/.local/lib/python3.8/site-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1936
1937 with self.accelerator.accumulate(model):
-> 1938 tr_loss_step = self.training_step(model, inputs)
1939
1940 if (
~/.local/lib/python3.8/site-packages/transformers/trainer.py in training_step(self, model, inputs)
2757
2758 with self.compute_loss_context_manager():
-> 2759 loss = self.compute_loss(model, inputs)
2760
2761 if self.args.n_gpu > 1:
~/.local/lib/python3.8/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
2782 else:
2783 labels = None
-> 2784 outputs = model(**inputs)
2785 # Save past state if it exists
2786 # TODO: this needs to be fixed and made cleaner later.
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/utils/operations.py in forward(*args, **kwargs)
579
580 def forward(*args, **kwargs):
--> 581 return model_forward(*args, **kwargs)
582
583 # To act like a decorator so that it can be popped when doing `extract_model_from_parallel`
~/.local/lib/python3.8/site-packages/accelerate/utils/operations.py in __call__(self, *args, **kwargs)
567
568 def __call__(self, *args, **kwargs):
--> 569 return convert_to_fp32(self.model_forward(*args, **kwargs))
570
571 def __getstate__(self):
/usr/lib/python3/dist-packages/torch/amp/autocast_mode.py in decorate_autocast(*args, **kwargs)
12 def decorate_autocast(*args, **kwargs):
13 with autocast_instance:
---> 14 return func(*args, **kwargs)
15 decorate_autocast.__script_unsupported = '@autocast() decorator is not supported in script mode' # type: ignore[attr-defined]
16 return decorate_autocast
~/.local/lib/python3.8/site-packages/peft/peft_model.py in forward(self, *args, **kwargs)
413 Forward pass of the model.
414 """
--> 415 return self.get_base_model()(*args, **kwargs)
416
417 def _get_base_model_class(self, is_prompt_tuning=False):
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/hooks.py in new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
167
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_features, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1417 )
1418
-> 1419 outputs = self.model(
1420 input_features,
1421 attention_mask=attention_mask,
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/hooks.py in new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
167
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_features, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, decoder_inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
1266 input_features = self._mask_input_features(input_features, attention_mask=attention_mask)
1267
-> 1268 encoder_outputs = self.encoder(
1269 input_features,
1270 head_mask=head_mask,
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/hooks.py in new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
167
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in forward(self, input_features, attention_mask, head_mask, output_attentions, output_hidden_states, return_dict)
854 return custom_forward
855
--> 856 layer_outputs = torch.utils.checkpoint.checkpoint(
857 create_custom_forward(encoder_layer),
858 hidden_states,
/usr/lib/python3/dist-packages/torch/utils/checkpoint.py in checkpoint(function, use_reentrant, *args, **kwargs)
247
248 if use_reentrant:
--> 249 return CheckpointFunction.apply(function, preserve, *args)
250 else:
251 return _checkpoint_without_reentrant(
/usr/lib/python3/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs)
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
507
508 if cls.setup_context == _SingleLevelFunction.setup_context:
/usr/lib/python3/dist-packages/torch/utils/checkpoint.py in forward(ctx, run_function, preserve_rng_state, *args)
105
106 with torch.no_grad():
--> 107 outputs = run_function(*args)
108 return outputs
109
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in custom_forward(*inputs)
850 def create_custom_forward(module):
851 def custom_forward(*inputs):
--> 852 return module(*inputs, output_attentions)
853
854 return custom_forward
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/hooks.py in new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
167
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in forward(self, hidden_states, attention_mask, layer_head_mask, output_attentions)
429 residual = hidden_states
430 hidden_states = self.self_attn_layer_norm(hidden_states)
--> 431 hidden_states, attn_weights, _ = self.self_attn(
432 hidden_states=hidden_states,
433 attention_mask=attention_mask,
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/accelerate/hooks.py in new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
167
~/.local/lib/python3.8/site-packages/transformers/models/whisper/modeling_whisper.py in forward(self, hidden_states, key_value_states, past_key_value, attention_mask, layer_head_mask, output_attentions)
288
289 # get query proj
--> 290 query_states = self.q_proj(hidden_states) * self.scaling
291 # get key, value proj
292 # `past_key_value[0].shape[2] == key_value_states.shape[1]`
/usr/lib/python3/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs)
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.8/site-packages/peft/tuners/lora.py in forward(self, x)
1052
1053 def forward(self, x: torch.Tensor):
-> 1054 result = super().forward(x)
1055
1056 if self.disable_adapters or self.active_adapter not in self.lora_A.keys():
~/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py in forward(self, x)
412 self.bias.data = self.bias.data.to(x.dtype)
413
--> 414 out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
415
416 if not self.state.has_fp16_weights:
~/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py in matmul(A, B, out, state, threshold, bias)
561 if threshold > 0.0:
562 state.threshold = threshold
--> 563 return MatMul8bitLt.apply(A, B, out, bias, state)
564
565
/usr/lib/python3/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs)
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
507
508 if cls.setup_context == _SingleLevelFunction.setup_context:
~/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py in forward(ctx, A, B, out, bias, state)
399 if using_igemmlt:
400 C32A, SA = F.transform(CA, "col32")
--> 401 out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
402 if bias is None or bias.dtype == torch.float16:
403 # we apply the fused bias here
~/.local/lib/python3.8/site-packages/bitsandbytes/functional.py in igemmlt(A, B, SA, SB, out, Sout, dtype)
1790 if has_error == 1:
1791 print(f'A: {shapeA}, B: {shapeB}, C: {Sout[0]}; (lda, ldb, ldc): {(lda, ldb, ldc)}; (m, n, k): {(m, n, k)}')
-> 1792 raise Exception('cublasLt ran into an error!')
1793
1794 torch.cuda.set_device(prev_device)
Exception: cublasLt ran into an error!
So frustrating...
Please help, Thank you for the great work!
Same error for me
Hello,
any news? Same error here, I cannot find anything useful in order to use the 8 bit quantization on the H100 GPUs.
This is the same error as #533. The problem was that I forgot to compile CUDA 11.8 for sm_90, which are H100 GPUs. The error message basically says that the code is not compiled for your GPU. I will fix this soon. Please continue the discussion in issue #533 until I have fixed this issue.
Hi @TimDettmers Do we have the fix yet?
Hello,
any news? Same error here, I cannot find anything useful in order to use the 8 bit quantization on the H100 GPUs.
@basteran Did you find the fix? @TimDettmers Any updates?
are there any updates here? am I missing something or did they just "forget" to support H100 GPUs and even months later this hasn't been fixed? has anyone found a workaround? @TimDettmers ?
This is actually a more complicated issue. The 8-bit implementation uses cuBLASLt which uses special format for 8-bit matrix multiplication. There are special formats for Ampere, Turning, and now Hopper GPUs. Hopper GPUs do not support Ampere or Turing formats. This means multiple CUDA kernels and the cuBLASLt integration need to be implemented to make 8-bit work on Hopper GPUs.
I think for now, the more realistic thing is to throw and error to let the user know that this features is currently not supported.
Bitsandbytes was not supported windows before, but my method can support windows.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes
3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows
4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl
Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)
OR you are Linux distribution (Ubuntu, MacOS, etc.)system ,AND CUDA Version: 11.X.
Bitsandbytes can support ubuntu.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes
3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows
4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/TimDettmers/bitsandbytes/releases/download/0.41.0/bitsandbytes-0.41.0-py3-none-any.whl
Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Can we please keep this issue (or #383 or #599 ) open? I still want to see this issue resolved, if possible.
This is actually a more complicated issue. The 8-bit implementation uses cuBLASLt which uses special format for 8-bit matrix multiplication. There are special formats for Ampere, Turning, and now Hopper GPUs. Hopper GPUs do not support Ampere or Turing formats. This means multiple CUDA kernels and the cuBLASLt integration need to be implemented to make 8-bit work on Hopper GPUs.
I think for now, the more realistic thing is to throw and error to let the user know that this features is currently not supported.
@TimDettmers could you use https://github.com/NVIDIA/TransformerEngine ?
At the first sight the exposed API seems too high-level for your needs, but their building blocks are tailored for Hopper (H100) and Ada (RTX4090) architectures, e.g. https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/gemm/cublaslt_gemm.cu
+1ing this. I notice it with local conda on H100 lambdalabs. Although I'm unsure whether this is a bitsandbytes error or something to do with CUDA for the H100s.
This error is related to H100, I tried loading the model on H100 and got the error, the same load8bit was tried on A100 and it's working fine.
Anyone able to resolve this?
Is still not available on H100 GPU instance?
Not yet unfortunately
do you guys have some solution for this?
Observing the same issue with H100, too.
Also with H800.
This is actually a more complicated issue. The 8-bit implementation uses cuBLASLt which uses special format for 8-bit matrix multiplication. There are special formats for Ampere, Turning, and now Hopper GPUs. Hopper GPUs do not support Ampere or Turing formats. This means multiple CUDA kernels and the cuBLASLt integration need to be implemented to make 8-bit work on Hopper GPUs.
I think for now, the more realistic thing is to throw and error to let the user know that this features is currently not supported.
Any plan to fix this?
The same problem comes for H20
The same with H800
Hi all,
I will keep this issue open, but please be aware that for now that 8bit is not supported in bitsandbytes on Hopper. It is recommended to use nf4 or fp4 instead.
Just want to add to this thread. Tried in H100 and not working. really hope bitesandbytes team and support this feature given that more and more ppl is gonna switch to newer version GPUs
Same to me. Not work after changing to bf16, fp16, fp4, or else.
Having same issue with H100E
Same problem
The same with H800 and H100
Still having the same issue
Still having the same issue on H100
Still having same issue on H100
Well, just came here to say I also ran into this issue using 8bit and H100. Would be very useful to have this working!
Hi all! We are currently working on LLM.int8 support for Hopper in PR #1401. I cannot give an accurate ETA for a release at the moment, but it will be supported soon!
same problem occurred
Would be very appreciated to have this working on H100.
Still get the same problem with H100.
Problem
Hello, I'm getting this weird cublasLt error on a lambdalabs H100 with cuda 118, pytorch 2.0.1, python3.10 Miniconda while trying to fine-tune a 3B param open-llama using LORA with 8bit loading. This only happens if we turn on 8bit loading. Lora alone or 4bit loading (qlora) works.
The same commands did work 2 weeks ago and stopped working a week ago.
I've tried bitsandbytes version 0.39.0 and 0.39.1 as prior versions don't work with H100. Source gives me a different issue as mentioned in Env section.
Expected
No error
Reproduce
Setup Miniconda then follow https://github.com/OpenAccess-AI-Collective/axolotl 's readme on lambdalabs and run the default open llama lora config.
Trace
0.39.0
Env
python -m bitsandbytes
on main branch: I get error same as here https://github.com/TimDettmers/bitsandbytes/issues/382
on 0.39.0
Misc
All related issues:
Also tried install
cudatoolkit
via conda.