Closed Siyuan011 closed 1 year ago
Can you provide the scripts to compile the code? And why you use tensorflow docker image to run a pytorch example?
Sorry, this problem is solved when I use pytorch image. Thank you.
I also got this error while trying to reproduce examples/pytorch/gpt/opt_summarization.py
EDIT:
I created everything over again and it worked
Getting this error when trying to run Pythia 12B using FT...
=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 1
pipeline_para_size: 1
ckpt_path: pythia-12b-ft/1-gpu
tokenizer_path: pythia-12b-hf
lib_path: ./lib/libtransformer-shared.so
sample_input_file: None
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================
[INFO] batch size: 8
Traceback (most recent call last):
File "../examples/pytorch/gptneox/gptneox_example.py", line 226, in <module>
main()
File "../examples/pytorch/gptneox/gptneox_example.py", line 152, in main
if not gpt.load(ckpt_path=ckpt_path):
File "/workspace/FasterTransformer/examples/pytorch/gptneox/../../../examples/pytorch/gptneox/utils/gptneox.py", line 240, in load
self.cuda()
File "/workspace/FasterTransformer/examples/pytorch/gptneox/../../../examples/pytorch/gptneox/utils/gptneox.py", line 254, in cuda
self.model = torch.classes.FasterTransformer.GptNeoXOp(self.head_num, self.size_per_head, self.inter_size,
File "/opt/conda/lib/python3.8/site-packages/torch/_classes.py", line 12, in __getattr__
proxy = torch._C._get_custom_class_python_wrapper(self.name, attr)
RuntimeError: Tried to instantiate class 'FasterTransformer.GptNeoXOp', but it does not exist! Ensure that it is registered via torch::class_
root@1547c4b20ea6:/workspace/FasterTransformer/build# python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path="pythia-12b-ft/1-gpu" --tokenizer_path="pythia-12b-hf" --lib_path="./lib/libtransformer-shared.so"
Got the same error.... what's the solution?
Got the same error.... what's the solution?
Make sure you run:
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON ..
make -j12
AND
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12
That fixed it for me :)
Got the same error.... what's the solution?
Make sure you run:
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON .. make -j12
AND
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON .. make -j12
That fixed it for me :)
Thanks! Unfortunately it didn't work for me :(
Got the same error.... what's the solution?
Make sure you run:
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON .. make -j12
AND
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON .. make -j12
That fixed it for me :)
Thanks! Unfortunately it didn't work for me :(
Can you send me what command you're running? And what the error is? What GPU are you running on? In Docker, right?
I am running on A10. I wasn't using docker; I built pytorch + mpi from source and then built FasterTransformer from source.
I am using python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libtransformer-shared.so --max_batch_size 5 --inference_data_type fp16 --time
I am running on A10. I wasn't using docker; I built pytorch + mpi from source and then built FasterTransformer from source.
I am using
python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libtransformer-shared.so --max_batch_size 5 --inference_data_type fp16 --time
Check if there is libth_transformer.so
in build/lib/
folder first... that should be the --lib_path not libtransformer-shared.so
!
Thanks a lot! I indeed have this file. After I change the command to python examples/pytorch/gptneox/gptneox_example.py --output_len 100 --top_p 0.9 --temperature 0.7 --ckpt_path converted_pythia_2.8b/1-gpu --tokenizer_path pythia-2.8b --lib_path build/lib/libth_transformer.so --max_batch_size 5 --inference_data_type fp16 --time
, it worked!
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/nvidia/tensorflow:22.09-tf1-py3
GPU name
A100
CUDA Driver
515.65.01
Reproduced Steps