codefuse-ai / FasterTransformer4CodeFuse

High-performance LLM inference based on our optimized version of FastTransfomer
Other
124 stars 9 forks source link

Error of try running model on 2GPUs #5

Closed horcruxen closed 9 months ago

horcruxen commented 9 months ago

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.09-py3

GPU name

A6000

CUDA Driver

525.60.11

Reproduced Steps

0. download the model from https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/files
1. git clone https://github.com/codefuse-ai/FasterTransformer4CodeFuse.git
2. docker-compose up -d pytorch2209
3. pip install --no-cache-dir pybind11==2.6.2 transformers accelerate sentencepiece

echo "export pybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11/" >> ~/.bashrc
export pybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11/
4. mkdir build ; cd build
export TORCH_PYTHON_LIBRARIES=/opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so
cmake -DCMAKE_BUILD_TYPE=Release -DSM="80;75" -DBUILD_PYT=ON -DSPARSITY_SUPPORT=OFF -DMEASURE_BUILD_TIME=ON \
      -DBUILD_CUTLASS_MIXED_GEMM=ON -DBUILD_MULTI_GPU=ON -DBUILD_TRT=OFF \
      -DENABLE_FP8=OFF -DBUILD_PYBIND=ON -DTORCH_PYTHON_LIBRARIES=${TORCH_PYTHON_LIBRARIES} ..
make -j"$(grep -c ^processor /proc/cpuinfo)"
5.export MODEL_NAME=codefuse
export TENSOR_PARA_SIZE=2

python ../examples/pytorch/codefuse/huggingface_convert.py \
       -o ../models/${MODEL_NAME}/fastertransformer \
       -i ../models/${MODEL_NAME}/transformers \
       -infer_gpu_num ${TENSOR_PARA_SIZE} \
       -processes 20 \
       -weight_data_type fp16 \
       -model_name gptneox
6# fp16 2gpus
torchrun --nproc_per_node 2 ../examples/pytorch/codefuse/codefuse_example.py \
         --world_size 2 \
         --ckpt_path ../models/${MODEL_NAME}/fastertransformer/2-gpu \
         --tokenizer_path ../models/${MODEL_NAME}/transformers

7. WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[INFO] WARNING: Have initialized the process group
[INFO] WARNING: Have initialized the process group
2023-12-07 12:50:00,317 - root - ERROR - head_num, size_per_head, vocab_size, and max_seq_len must be the same as the ones during training (idx: 288 expected shape: torch.Size([8192, 16384]) got shape: torch.Size([90177536])).
Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 377, in load
    self.w[i] = w[i].reshape(self.w[i].shape)
RuntimeError: shape '[8192, 16384]' is invalid for input of size 90177536

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 821, in __init__
    model, tokenizer, trie = init_model_and_tokenizer(lib_path=lib_path,
  File "../examples/pytorch/codefuse/codefuse_example.py", line 655, in init_model_and_tokenizer
    if not gpt.load(ckpt_path=ckpt_path):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 507, in load
    is_load = self.weights.load(ckpt_path, tensor_para_rank=self.tensor_para_rank,
  File "../examples/pytorch/codefuse/codefuse_example.py", line 382, in load
    raise RuntimeError(
RuntimeError: head_num, size_per_head, vocab_size, and max_seq_len must be the same as the ones during training (idx: 288 expected shape: torch.Size([8192, 16384]) got shape: torch.Size([90177536])).
local_rank: 0
2023-12-07 12:50:49,199 - root - ERROR - head_num, size_per_head, vocab_size, and max_seq_len must be the same as the ones during training (idx: 288 expected shape: torch.Size([8192, 16384]) got shape: torch.Size([90177536])).
Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 377, in load
    self.w[i] = w[i].reshape(self.w[i].shape)
RuntimeError: shape '[8192, 16384]' is invalid for input of size 90177536

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 821, in __init__
    model, tokenizer, trie = init_model_and_tokenizer(lib_path=lib_path,
  File "../examples/pytorch/codefuse/codefuse_example.py", line 655, in init_model_and_tokenizer
    if not gpt.load(ckpt_path=ckpt_path):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 507, in load
    is_load = self.weights.load(ckpt_path, tensor_para_rank=self.tensor_para_rank,
  File "../examples/pytorch/codefuse/codefuse_example.py", line 382, in load
    raise RuntimeError(
RuntimeError: head_num, size_per_head, vocab_size, and max_seq_len must be the same as the ones during training (idx: 288 expected shape: torch.Size([8192, 16384]) got shape: torch.Size([90177536])).
local_rank: 1
2023-12-07 12:50:49,990 - root - ERROR - call back init error: 'CodeFuseHandler' object has no attribute 'model'
Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 874, in predict
    result, lengths, cum_log_probs, latency = generate(self.model, self.tokenizer,
AttributeError: 'CodeFuseHandler' object has no attribute 'model'

Traceback (most recent call last):
  File "../examples/pytorch/codefuse/codefuse_example.py", line 976, in <module>
    raise RuntimeError()
RuntimeError
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 447 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 446) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.13.0a0+d0d6b1f', 'console_scripts', 'torchrun')())
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
../examples/pytorch/codefuse/codefuse_example.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-12-07_12:50:50
  host      : 65bcd94c4d64
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 446)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
teacher-cow commented 9 months ago

cmd: make -j"$(grep -c ^processor /proc/cpuinfo)"
if i execute this command,Raise error :“[ 1%] Built target layernorm_kernels;make: *** [Makefile:136: all] Error 2 ” ; Have you take this error?

horcruxen commented 9 months ago

cmd: make -j"$(grep -c ^processor /proc/cpuinfo)" if i execute this command,Raise error :“[ 1%] Built target layernorm_kernels;make: *** [Makefile:136: all] Error 2 ” ; Have you take this error?

No, run in docker nvcr.io/nvidia/pytorch:22.09-py3

teacher-cow commented 9 months ago

cmd: make -j"$(grep -c ^processor /proc/cpuinfo)" if i execute this command,Raise error :“[ 1%] Built target layernorm_kernels;make: *** [Makefile:136: all] Error 2 ” ; Have you take this error?

No, run in docker nvcr.io/nvidia/pytorch:22.09-py3

can i take friend with you , this is my wechat: “hlhaxb”, i hvae some question for this project consultant.

zhang-ge-hao commented 9 months ago

@horcruxen

Sorry for the late reply, this repository is not for CodeLlama but mainly focuses on CodeFuse-13B.

Although we have implemented support for the Llama structure internally, it is not yet open-sourced.

Using FasterTransformer currently might not be the most efficient way, you can use other open-source methods for inference. Perhaps VLLM or TensorRT-LLM?