Closed dontyougetthere closed 3 months ago
运行 test.py 时,使用分布式数据并行 (DDP) 将其分布到 4 个 GPU 上,并发现报告“进程 2 终止于信号 SIGKILL”。查看日志结果: 内存+交换: 使用量 117964800kB, 限制 117964800kB 运行 test.py 需要多少内存?
Hello, I'm experiencing the same problem, i.e. the model won't run on more than one GPU, and would like to ask you if you solved it and how to solve it, thanks!
When running test.py, Distributed Data Parallel (DDP) was used to distribute it across 4 Gpus and found that "process 2 terminated with signal SIGKILL" was reported. Viewing log findings: memory+swap: usage 117964800kB, limit 117964800kB How much memory is needed to run test.py?