OpenBMB / BMInf

Efficient Inference for Big Models
Apache License 2.0
572 stars 67 forks source link

[BUG] GPU RTX4090 report errors #68

Open zc3r opened 1 year ago

zc3r commented 1 year ago

ERROR in app: Exception on /api/fillblank [POST] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2070, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1515, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1513, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1499, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "main.py", line 66, in fillBlank result = fillblank.fillBlank(model) File "/app/controller/fill_blank_controller.py", line 18, in fillBlank presence_penalty = presence_penalty1) File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 151, in fill_blank frequency_penalty, presence_penalty, 0) File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 103, in pre_processing ctx = self.encode(np.array([idx], dtype=np.int64), [input_length]) File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode True File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias) File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward qkv_i32 File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm _igemm(allocator, a, aT, b, bT, c, device, stream) File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 180, in _igemm cublasLt.checkCublasStatus( cublasLt.cublasLtMatrixTransform(lthandle, transform_desc_b, ctypes.byref(v1), b.data.ptr, layout_b, ctypes.byref(v0), 0, 0, trans_b.ptr, layout_trans_b, stream.ptr) ) File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status]) RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED