NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.77k stars 882 forks source link

RuntimeError: Tried to instantiate class 'FasterTransformer.ParallelGptOp', but it does not exist! Ensure that it is registered via torch::class_ #518

Closed Siyuan011 closed 1 year ago

Siyuan011 commented 1 year ago

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/tensorflow:22.09-tf1-py3

GPU name

A100

CUDA Driver

515.65.01

Reproduced Steps

Hi,
When I was using nvidia docker to run, I meet this problem:
root@edc6395a3639:/workspace/FasterTransformer/build# python ../examples/pytorch/gpt/multi_gpu_gpt_example.py

=================== Arguments ===================
layer_num.....................: 12
input_len.....................: 1
output_len....................: 32
head_num......................: 16
size_per_head.................: 64
vocab_size....................: 50304
beam_width....................: 1
top_k.........................: 1
top_p.........................: 0.0
temperature...................: 1.0
len_penalty...................: 0.0
beam_search_diversity_rate....: 0.0
tensor_para_size..............: 1
pipeline_para_size............: 1
ckpt_path.....................: ../models/openai-gpt-models/c-model/124m/1-gpu
lib_path......................: ./lib/libtf_gpt.so
vocab_file....................: ../models/gpt2-vocab.json
merges_file...................: ../models/gpt2-merges.txt
start_id......................: 50256
end_id........................: 50256
max_batch_size................: 8
repetition_penalty............: 1.0
presence_penalty..............: 0.0
min_length....................: 0
max_seq_len...................: 768
inference_data_type...........: fp32
time..........................: False
sample_input_file.............: None
sample_output_file............: None
enable_random_seed............: False
skip_end_tokens...............: False
detokenize....................: True
use_jieba_tokenizer...........: False
int8_mode.....................: 0
weights_data_type.............: fp32
return_cum_log_probs..........: 0
shared_contexts_ratio.........: 1.0
banned_words..................: 
use_gpt_decoder_ops...........: False
=================================================

2023-03-23 06:03:26.606606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[INFO] WARNING: Have initialized the process group
Traceback (most recent call last):
  File "../examples/pytorch/gpt/multi_gpu_gpt_example.py", line 371, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "../examples/pytorch/gpt/multi_gpu_gpt_example.py", line 260, in main
    if not gpt.load(ckpt_path=args.ckpt_path):
  File "/workspace/FasterTransformer/examples/pytorch/gpt/../../../examples/pytorch/gpt/utils/gpt.py", line 542, in load
    self.cuda()
  File "/workspace/FasterTransformer/examples/pytorch/gpt/../../../examples/pytorch/gpt/utils/parallel_gpt.py", line 30, in cuda
    self.model = torch.classes.FasterTransformer.ParallelGptOp(
  File "/usr/local/lib/python3.8/dist-packages/torch/_classes.py", line 12, in __getattr__
    proxy = torch._C._get_custom_class_python_wrapper(self.name, attr)
RuntimeError: Tried to instantiate class 'FasterTransformer.ParallelGptOp', but it does not exist! Ensure that it is registered via torch::class_
byshiue commented 1 year ago

Can you provide the scripts to compile the code? And why you use tensorflow docker image to run a pytorch example?

Siyuan011 commented 1 year ago

Sorry, this problem is solved when I use pytorch image. Thank you.

vjeronymo2 commented 1 year ago

I also got this error while trying to reproduce examples/pytorch/gpt/opt_summarization.py EDIT: I created everything over again and it worked

gilljon commented 1 year ago

Getting this error when trying to run Pythia 12B using FT...

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 1
pipeline_para_size: 1
ckpt_path: pythia-12b-ft/1-gpu
tokenizer_path: pythia-12b-hf
lib_path: ./lib/libtransformer-shared.so
sample_input_file: None
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

[INFO] batch size: 8
Traceback (most recent call last):
  File "../examples/pytorch/gptneox/gptneox_example.py", line 226, in <module>
    main()
  File "../examples/pytorch/gptneox/gptneox_example.py", line 152, in main
    if not gpt.load(ckpt_path=ckpt_path):
  File "/workspace/FasterTransformer/examples/pytorch/gptneox/../../../examples/pytorch/gptneox/utils/gptneox.py", line 240, in load
    self.cuda()
  File "/workspace/FasterTransformer/examples/pytorch/gptneox/../../../examples/pytorch/gptneox/utils/gptneox.py", line 254, in cuda
    self.model = torch.classes.FasterTransformer.GptNeoXOp(self.head_num, self.size_per_head, self.inter_size,
  File "/opt/conda/lib/python3.8/site-packages/torch/_classes.py", line 12, in __getattr__
    proxy = torch._C._get_custom_class_python_wrapper(self.name, attr)
RuntimeError: Tried to instantiate class 'FasterTransformer.GptNeoXOp', but it does not exist! Ensure that it is registered via torch::class_
root@1547c4b20ea6:/workspace/FasterTransformer/build# python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path="pythia-12b-ft/1-gpu" --tokenizer_path="pythia-12b-hf" --lib_path="./lib/libtransformer-shared.so"
puyuanOT commented 1 year ago

Got the same error.... what's the solution?

gilljon commented 1 year ago

Got the same error.... what's the solution?

Make sure you run:

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON ..
make -j12

AND

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12

That fixed it for me :)

puyuanOT commented 1 year ago

Got the same error.... what's the solution?

Make sure you run:

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON ..
make -j12

AND

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12

That fixed it for me :)

Thanks! Unfortunately it didn't work for me :(

gilljon commented 1 year ago

Got the same error.... what's the solution?

Make sure you run:

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON ..
make -j12

AND

cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12

That fixed it for me :)

Thanks! Unfortunately it didn't work for me :(

Can you send me what command you're running? And what the error is? What GPU are you running on? In Docker, right?

puyuanOT commented 1 year ago

I am running on A10. I wasn't using docker; I built pytorch + mpi from source and then built FasterTransformer from source.

I am using python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libtransformer-shared.so --max_batch_size 5 --inference_data_type fp16 --time

gilljon commented 1 year ago

I am running on A10. I wasn't using docker; I built pytorch + mpi from source and then built FasterTransformer from source.

I am using python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libtransformer-shared.so --max_batch_size 5 --inference_data_type fp16 --time

Check if there is libth_transformer.so in build/lib/ folder first... that should be the --lib_path not libtransformer-shared.so !

puyuanOT commented 1 year ago

Thanks a lot! I indeed have this file. After I change the command to python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libth_transformer.so --max_batch_size 5 --inference_data_type fp16 --time, it worked!