Closed zyan97 closed 3 years ago
我估计可能是你的内存不够了,第二阶段需要大量的内存,看一下你的内存使用情况。
谢谢回复,想问一下您微调时候具体是什么什么样的配置嘛?谢谢
发自我的iPhone
------------------ 原始邮件 ------------------ From: hamlet <notifications@github.com> 发送时间: 12/20/2020, 14:13 To: ghosthamlet/gpt2-ml-torch <gpt2-ml-torch@noreply.github.com> 抄送: zyan97 <1085736650@qq.com>, Author <author@noreply.github.com> Subject: Reply:[ghosthamlet/gpt2-ml-torch] 微调 (#13)
我估计可能是你的内存不够了,第二阶段需要大量的内存,看一下你的内存使用情况。
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
硬件配置: 1GPU1080Ti 11G显存,64G内存 4GPU2080Ti 各11G显存,128G内存, 多GPU数据并行训练时内存需求更大 代码库内的finetune_lm.py没有优化数据加载,所有数据会一次性载入内存。
训练参数见讨论#6
您好,我在微调第二阶段会出现如下信息提示然后停止运行,请问这个问题应该如何解决?
Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at /content/drive/MyDrive/Text_Generation/mega-clue-tok and are newly initialized: ['lm_head.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Using /root/.cache/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 1.7764122486114502 seconds Adam Optimizer #0 is created with AVX512 arithmetic capability. Config: alpha=0.000000, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1 [2020-12-19 11:11:43,772] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.7, git-hash=unknown, git-branch=unknown [2020-12-19 11:11:43,801] [INFO] [engine.py:71:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1 [2020-12-19 11:11:43,853] [INFO] [engine.py:588:_configure_optimizer] Using client Optimizer as basic optimizer [2020-12-19 11:11:43,854] [INFO] [engine.py:597:_configure_optimizer] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 initial_lr: 5e-08 lr: 0.0 weight_decay: 0.01 ) Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'> [2020-12-19 11:11:43,854] [INFO] [engine.py:715:_configure_zero_optimizer] Creating fp16 ZeRO stage 2 optimizer Using /root/.cache/torch_extensions as PyTorch extensions root... Emitting ninja build file /root/.cache/torch_extensions/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module utils... Time to load utils op: 1.1326243877410889 seconds [2020-12-19 11:11:44,987] [INFO] [stage2.py:130:init] Reduce bucket size 3000000 [2020-12-19 11:11:44,987] [INFO] [stage2.py:131:init] Allgather bucket size 3000000 [2020-12-19 11:11:44,987] [INFO] [stage2.py:132:init] CPU Offload: True tcmalloc: large alloc 2747621376 bytes == 0x7f99bf444000 @ 0x7fa10a734b6b 0x7fa10a754379 0x7fa0ae9a974e 0x7fa0ae9ab7b6 0x7fa0e9415d53 0x7fa0e8d9054a 0x7fa0e90eac0a 0x7fa0e9112803 0x7fa0e9298b14 0x7fa0e93d54ee 0x7fa0e8e2e976 0x7fa0e8e2fb30 0x7fa0e90ecb09 0x7fa0e896b249 0x7fa0e9285ae8 0x7fa0e91918a5 0x7fa0e8e3141b 0x7fa0e93217d8 0x7fa0e896b249 0x7fa0e9285ae8 0x7fa0e91919f5 0x7fa0ea765997 0x7fa0e896b249 0x7fa0e9285ae8 0x7fa0e91919f5 0x7fa0f8fab30e 0x50a4a5 0x50cc96 0x5095c8 0x50a2fd 0x50beb4 tcmalloc: large alloc 2747621376 bytes == 0x7f99bf444000 @ 0x7fa10a734b6b 0x7fa10a754379 0x7fa0ae9a974e 0x7fa0ae9ab7b6 0x7fa0e9415d53 0x7fa0e8e008cf 0x7fa0e9117cac 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e90bdb4b 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e91ac2be 0x7fa0ea6bed6e 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e90bdb4b 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e91ac2be 0x7fa0e8dee910 0x7fa0e93673f3 0x7fa0e896ba68 0x7fa0e90e9643 0x7fa0e93e7ff9 0x7fa0f8d53902 0x7fa0f8e59c8b 0x50a4a5 0x50beb4 0x507be4 0x508ec2 tcmalloc: large alloc 2747621376 bytes == 0x7f98763aa000 @ 0x7fa10a734b6b 0x7fa10a754379 0x7fa0ae9a974e 0x7fa0ae9ab7b6 0x7fa0e9415d53 0x7fa0e8e008cf 0x7fa0e9117cac 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e90bdb4b 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e91ac2be 0x7fa0e8dffdf1 0x7fa0e90ebaa1 0x7fa0e9112d6e 0x7fa0e92c9001 0x7fa0e91e6a68 0x7fa0ea7e07e5 0x7fa0e9112d6e 0x7fa0e92c9001 0x7fa0e93e6218 0x7fa0f8e17471 0x50a4a5 0x50beb4 0x507be4 0x508ec2 0x594a01 0x549e8f 0x5515c1 0x5a9dac tcmalloc: large alloc 5495242752 bytes == 0x7f972eafe000 @ 0x7fa10a734b6b 0x7fa10a754379 0x7fa0ae9a974e 0x7fa0ae9ab7b6 0x7fa0e9415d53 0x7fa0e8e008cf 0x7fa0e9117cac 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e90bdb4b 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e91ac2be 0x7fa0ea6bed6e 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e90bdb4b 0x7fa0e90c331b 0x7fa0e90e2135 0x7fa0e91ac2be 0x7fa0e8dec800 0x7fa0e93197ea 0x7fa0e896b081 0x7fa0e9405c76 0x7fa0e93e9810 0x7fa0f8d5223d 0x7fa0f8e5cd33 0x7fa0f8e68c20 0x50a4a5 0x50beb4 0x507be4 tcmalloc: large alloc 5495242752 bytes == 0x7f95e0754000 @ 0x7fa10a734b6b 0x7fa10a754379 0x7fa0ae9a974e 0x7fa0ae9ab7b6 0x7fa0e8dfffa2 0x7fa0e90eabd3 0x7fa0e90c2207 0x7fa0e90dd2dc 0x7fa0e90b978a 0x7fa0e90c2207 0x7fa0e90dd2dc 0x7fa0e91a90dd 0x7fa0e8df6a25 0x7fa0e931cc97 0x7fa0e936b525 0x7fa0e89cd0ce 0x7fa0e90e66f3 0x7fa0e90bffa2 0x7fa0e89cd0ce 0x7fa0e90e66f3 0x7fa0e91ccc06 0x7fa0f8e8a5a7 0x50a4a5 0x50cc96 0x5095c8 0x50a2fd 0x50beb4 0x507be4 0x508ec2 0x594a01 0x549e8f