Paddle/PaddleNLP llama 7B pretrain 存在内存泄漏

shang-mt commented 5 hours ago

bug描述 Describe the Bug

paddlepaddle-gpu 2.6.0.post117 paddlenlp : https://github.com/ZHUI/PaddleNLP， branch : sci/benckmark commit id 20fe363530c0e3868414f65ec394124ffac6b9b2 基于以上版本在A100上测试4卡 llama 7B pretrain，存在内存泄漏，

llama/pretrain-llama_13b-pp4tp2sd2_stage1.json 配置文件信息如下， { "model_name_or_path": "facebook/llama-7b", "tokenizer_name_or_path": "facebook/llama-7b", "input_dir": "/workspace", "output_dir": "/root/llama-7b", "per_device_train_batch_size": 2, "gradient_accumulation_steps": 256, "per_device_eval_batch_size": 64, "tensor_parallel_degree": 2, "pipeline_parallel_degree": 2, "pipeline_parallel_config": "disable_partial_send_recv", "sharding_parallel_degree": -1, "virtual_pp_degree": 1, "sharding": "stage1", "sequence_parallel": 1, "adam_beta1": 0.9, "adam_beta2": 0.95, "use_flash_attention": true, "use_fused_rms_norm": true, "use_fused_rope": true, "max_seq_length": 2048, "learning_rate": 1e-04, "initializer_range": 0.002, "min_learning_rate": 1e-05, "warmup_steps": 2000, "logging_steps": 1, "max_steps": 200000, "save_steps": 2000, "eval_steps": 2000, "weight_decay": 0.1, "max_grad_norm": 1.0, "amp_master_grad": 1, "fp16": true, "fp16_opt_level": "O2", "dataloader_num_workers": 1, "continue_training": 0, "do_train": true, "do_eval": true, "do_predict": true, "disable_tqdm": true, "recompute": false, "distributed_dataloader": 0, "recompute_granularity": "full", "save_total_limit": 2, "eval_accumulation_steps": 16 }

其他补充信息 Additional Supplementary Information

No response

ZHUI commented 5 hours ago

你好，预训练数据采用了mmap读取，慢慢增长是正常现象，但是不会超过数据集大小的上限。

shang-mt commented 4 hours ago

你好，预训练数据采用了mmap读取，慢慢增长是正常现象，但是不会超过数据集大小的上限。

好的，我们再多测试一会，在我们自己卡上，1000G内存会耗光

PaddlePaddle / Paddle

Paddle/PaddleNLP llama 7B pretrain 存在内存泄漏 #68336

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information