deep-floyd / IF

Other
7.64k stars 497 forks source link

cuBLAS issue. #72

Closed phalexo closed 1 year ago

phalexo commented 1 year ago

I have freshly installed CUDA toolkit 11.8 on both the host, and inside a docker container. Within the container I run "jupyter notebook"

Previously I got the same error with CUDA 11.3

My understanding is that cuBLAS is part of the CUDA toolkit, and therefore should be available.

import os import torch os.environ['FORCE_MEM_EFFICIENT_ATTN'] = "1" import sys from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII from deepfloyd_if.modules.t5 import T5Embedder from deepfloyd_if.pipelines import dream, style_transfer, super_resolution, inpainting import torch.nn.functional as F import random import torchvision.transforms as T import numpy as np import requests from PIL import Image import torch import re print("Loaded modules")

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:0') if_II = IFStageII('IF-II-L-v1.0', device='cuda:1') if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2') t5 = T5Embedder(device='cuda:3')

prompt = 'lush garden' count = 4

result = dream( t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, prompt=[prompt]*count, seed=42, if_I_kwargs={ "guidance_scale": 7.0, "sample_timestep_respacing": "smart100", }, if_II_kwargs={ "guidance_scale": 4.0, "sample_timestep_respacing": "smart50", }, ) if_I.show(result['I'], size=3) if_I.show(result['II'], size=6) if_I.show(result['III'], size=14)

166 return module._hf_hook.post_forward(module, output)

File ~/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py:530, in T5Attention.forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions) 525 value_states = project( 526 hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None 527 ) 529 # compute scores --> 530 scores = torch.matmul( 531 query_states, key_states.transpose(3, 2) 532 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9 534 if position_bias is None: 535 if not self.has_relative_attention_bias:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

phalexo commented 1 year ago

I have found one working combination. If set count = 1, and t5 = T5Embedder(device='cpu'), it generates at least one pic, I only run the if_I.show(result['III'], size=14). Perhaps a blas function is defined for the CPU and not for a GPU?