bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
Other
3.17k stars 328 forks source link

Trained a transformer model ,but cannot infer (CUDA ERROR): invalid device function #194

Open VincentChen95 opened 2 years ago

VincentChen95 commented 2 years ago

Hello team, I trained a model using the lightseq-train command. Then i converted it to "pb" format. Finally, I run the infer with this code:

import lightseq.inference as lsi
import sys

def main():
    file_name = sys.argv[1]
    model = lsi.Transformer(file_name, 8)
    output = model.infer([[1, 2, 3], [4, 5, 6]])

main()

Received this error:

Parsing protobuf: lightseq/examples/inference/python/export/transformer.pb
Finish loading src_emb_wei from host to device
Finish loading trg_emb_wei from host to device
Finish loading enc_wei from host to device
Finish loading dec_wei from host to device
Finish loading all weight from host to device
***model config***
encoder layers: 4
decoder layers: 4
hidden size: 512
inner size: 1024
head number: 8
dim per head: 64
src vocab size: 176
trg vocab size: 168
is_post_ln: 0
no_scale_embedding: 0
use_gelu: 0
start_id: 2
end_id: 6
padding_id: 2
is_multilingual: 0

***generator config***
beam size: 4
extra decode length(max decode length - src input length): 50
length penalty: 0.6
diverse lambda: 0
sampling method: beam_search
topk: 1
topp: 0.75
Allocated 576MB GPU buffer for transformer
decoder buffer init start
Traceback (most recent call last):
  File "infer.py", line 12, in <module>
    main()
  File "infer.py", line 8, in main
    model = lsi.Transformer(file_name, 8)
RuntimeError: [CUDA][ERROR] /tmp/build-via-sdist-lcobild4/lightseq-2.1.4/lightseq/inference/model/decoder.cc.cu(205): invalid device function

could you please help me take a look? thank you!

VincentChen95 commented 2 years ago

my configuration: lightseq 2.1.4 ninja 1.10.2.1 torch 1.7.1 torchaudio 0.7.0a0+a853dff torchvision 0.8.2

VincentChen95 commented 2 years ago

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 | | N/A 47C P0 36W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... Off | 00000000:00:05.0 Off | 0 | | N/A 51C P0 38W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla P100-PCIE... Off | 00000000:00:06.0 Off | 0 | | N/A 47C P0 34W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla P100-PCIE... Off | 00000000:00:07.0 Off | 0 | | N/A 50C P0 30W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Taka152 commented 2 years ago

The pre-build pypi version only supports 61 70 and 75 arch. Since p100 is 60, you could change this line https://github.com/bytedance/lightseq/blob/4d370305656fe602f8628fc3bc8aa7b81464075b/CMakeLists.txt#L4 and build from source.

VincentChen95 commented 2 years ago

hello @Taka152 thank you for your reply how could i build from source? do you mean that git clone first, make the change, then do "pip install -e ."? thank you

Taka152 commented 2 years ago

Yes, you can follow this doc