Closed phalexo closed 1 year ago
what is the error?
Edit: it runs fine on my setup with T5Embedder(device='cuda:0')
.
the model defaults to bfloat16, so maybe try specifying a different dtype: t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)
.
can you post the result of python -m torch.utils.collect_env
to see if there are any issues with your install?
what is the error? Edit: it runs fine on my setup with
T5Embedder(device='cuda:0')
. the model defaults to bfloat16, so maybe try specifying a different dtype:t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)
. can you post the result ofpython -m torch.utils.collect_env
to see if there are any issues with your install?
166 return module._hf_hook.post_forward(module, output)
File ~/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py:530, in T5Attention.forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions) 525 value_states = project( 526 hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None 527 ) 529 # compute scores --> 530 scores = torch.matmul( 531 query_states, key_states.transpose(3, 2) 532 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9 534 if position_bias is None: 535 if not self.has_relative_attention_bias:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
what is the error? Edit: it runs fine on my setup with
T5Embedder(device='cuda:0')
. the model defaults to bfloat16, so maybe try specifying a different dtype:t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)
. can you post the result ofpython -m torch.utils.collect_env
to see if there are any issues with your install?166 return module._hf_hook.post_forward(module, output)
File ~/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py:530, in T5Attention.forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions) 525 value_states = project( 526 hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None 527 ) 529 # compute scores --> 530 scores = torch.matmul( 531 query_states, key_states.transpose(3, 2) 532 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9 534 if position_bias is None: 535 if not self.has_relative_attention_bias:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
have you tried manually installing libcublas? how much ram/vram do you have? according to this thread for a different model, this error can be a disguised OOM error
If you are trying to run the encoder and all 3 stages on gpu, you would need ~23 GBs for the models alone, plus the memory required to actually run them, so a gpu with 24GB probably isn't enough. with the diffusers implementation you should be able to run it just fine though, if you enable cpu offload, assuming you have enough (~32GB) system memory.
If you are trying to run the encoder and all 3 stages on gpu, you would need ~23 GBs for the models alone, plus the memory required to actually run them, so a gpu with 24GB probably isn't enough. with the diffusers implementation you should be able to run it just fine though, if you enable cpu offload, assuming you have enough (~32GB) system memory.
I am running the pipeline over 3 or 4 GPUs. Each has 12GB, looking at VRAM utilization, it seems OK.
Changing the data type to float16 fixed it.
Thanks for the suggestion.
I have tried all kinds of combinations torch 1.13.1 and 2.0.0, CUDA 11.3 and CUDA 11.8.
torch.matmul fails on a GPU.