NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.64k stars 877 forks source link

AssertionError: tensor_para_size * pipeline_para_size must be equal to world_size. world_size always equals to -1 #639

Open starlitsky2010 opened 1 year ago

starlitsky2010 commented 1 year ago

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:23.04-py3

GPU name

3090

CUDA Driver

530.30.02

Reproduced Steps

# Pull docker image
sudo nvidia-docker pull nvcr.io/nvidia/pytorch:23.04-py3
sudo nvidia-docker run --gpus '"device=0,1"' -itd --rm --network=host --volume="$PWD:/workspace" --name=faster_transformer_23.04_0 nvcr.io/nvidia/pytorch:23.04-py3
docker exec -it 611e085e2a60c20172c95f942cc75a948991f126a364d81a20ba41d486045f39 /bin/bash

# Setup FasterTransformer
git clone https://github.com/NVIDIA/FasterTransformer.git
mkdir -p FasterTransformer/build
cd FasterTransformer/build
git submodule init && git submodule update

According to https://developer.nvidia.com/cuda-gpus
3090 should be -DSM=86

cmake -DSM=86 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j40
pip install -r ../examples/pytorch/gpt/requirement.txt

Then, try Bloom model by method below:
According to:
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md

git clone https://huggingface.co/bigscience/bloom-560m
python ../examples/pytorch/gpt/utils/huggingface_bloom_convert.py \
    --input-dir bloom-560m \
    --output-dir bloom-560m/c-model \
    -tp 1 -p 4 -v
wget https://github.com/cybertronai/bflm/raw/master/lambada_test.jsonl -P ../datasets/lambada
# Run HF benchmark
python ../examples/pytorch/gpt/bloom_lambada.py \
    --tokenizer-path bloom-560m \
    --dataset-path ../datasets/lambada/lambada_test.jsonl \
    --test-hf --show-progress
It runs normally.

# Run FT benchmark
python ../examples/pytorch/gpt/bloom_lambada.py \
    --checkpoint-path bloom-560m/c-model/1-gpu \
    --tokenizer-path bloom-560m \
    --dataset-path ../datasets/lambada/lambada_test.jsonl \
    --show-progress
But for FT it fails with error logs below:
Traceback (most recent call last):
  File "../examples/pytorch/gpt/bloom_lambada.py", line 409, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "../examples/pytorch/gpt/bloom_lambada.py", line 304, in main
    model, tokenizer = get_model_and_tokenizer(args)
  File "../examples/pytorch/gpt/bloom_lambada.py", line 265, in get_model_and_tokenizer
    model = bloom.Bloom(**model_args)
  File "/workspace/FasterTransformer/examples/pytorch/gpt/utils/bloom.py", line 245, in __init__
    super().__init__(
  File "/workspace/FasterTransformer/examples/pytorch/gpt/utils/gpt.py", line 535, in __init__
    assert world_size == tensor_para_size * pipeline_para_size, "tensor_para_size * pipeline_para_size must be equal to world_size."
AssertionError: tensor_para_size * pipeline_para_size must be equal to world_size.

I've printed the world_size value and it's -1.
Could you help give me some tips about how to solve this problem.

Thanks
starlitsky2010 commented 1 year ago

It works well when I use nvcr.io/nvidia/pytorch:22.09-py3 docker container.

But another bug still exists the world_size is always for Bloom model when doing inference.

gangmul12 commented 7 months ago

Although it's been a while and support seems to have been dropped, I got similar problem when running bert_example.py, and in my case, it turns out that it is PyTorch 2.0.0 bug. Maybe your problem is similar one..? https://github.com/pytorch/pytorch/issues/97507