NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.13k stars 2.28k forks source link

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline #830

Open Hongjie1Chu opened 4 months ago

Hongjie1Chu commented 4 months ago

I encountered a problem when using the Megatron pipeline. The function I am using is forward_backward_pipelining_without_interleaving. In this pipeline function, each pipeline stage calls forward_step for the forward pass:

output_tensor = forward_step(forward_step_func, data_iterator, model, input_tensor, losses_reduced)

The input for the forward pass should be the output from the previous stage. However, in the megatron/schedule.py file, the forward_step function is defined as follows:

unwrapped_model.set_input_tensor(input_tensor) output_tensor, loss_func = forward_step_func(data_iterator, model)

This implies that each stage in the forward pass still gets data from the dataset and processes it, which seems to contradict the concept of pipelining. Could you please explain the rationale behind this design?

code in pretrained_gpy.py:

image

Here are my results: image

My configuration:

GPUS_PER_NODE=4

Change for multinode config

MASTER_ADDR=172.20.20.220 MASTER_PORT=6000 NNODES=1 NODE_RANK=0 WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))

DATA_PATH=data/my-gpt2_text_document CHECKPOINT_PATH=model/model_optim_rng.pt MODEL_PATH=model/output/pp DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT"

python -m torch.distributed.launch $DISTRIBUTED_ARGS \ pretrain_gpt.py \ --tensor-model-parallel-size 1 \ --pipeline-model-parallel-size 4 \ --num-layers 12 \ --hidden-size 1024 \ --num-attention-heads 16 \ --micro-batch-size 16 \ --global-batch-size 64 \ --seq-length 1024 \ --max-position-embeddings 1024 \ --train-iters 1

Feel free to adjust anything as needed before posting!

github-actions[bot] commented 2 months ago

Marking as stale. No activity in 60 days.