EdwardPooh / douzero-resnet-2.0

GNU General Public License v3.0
22 stars 4 forks source link

运行报错了 #10

Closed cobe110 closed 2 weeks ago

cobe110 commented 2 months ago

root0cobe:~/douzero/douzero-resnet-2.0/Douzero_Resnet# python3 train.py-pu_devices -num actor devices 3-num actors 21-training device 0 -actor_devicе cpu Found log directory: douzero checkpoints/douzero Saving arguments to douzero checkpoints/douzero/meta.json Path to meta file already exists. Not overriding meta. Saving messages to douzero checkpoints/douzero/out.log Path to message file already exists. New data will be appended. Saving logs data to douzero checkpoints/douzero/logs.csv Saving logs' fields to douzero checkpoints/douzero/fields.csv Traceback (most recent call last): File "train. py", line 9, in File "/root/douzero/douzero-resnet-2.0/Douzero Resnet/douzero/dmc/dmc.py", line 105, in train File"/root/douzero/douzero-resnet-2.0/Douzero_Resnet/douzero/dmc/models.py",line 275,in share_memory File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1790, in share memory File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 570, in apply File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 570, in apply File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 570, in apply File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 593, in apply File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1790, in File "/usr/local/lib/python3.6/dist-packages/torch/_tensor.py", line 426, in sharememory File "/usr/local/lib/python3.6/dist-packages/torch/storage.py", line 145, in share memory RuntumeError: falseINTERNAL ASSERT FAILED at "./aten/src/ATen/MapAllocator. cpp": 263, please report a bug to PyTorch. unable to open shared memory object </to rch 4691 1013> in read-write mode

EdwardPooh commented 2 months ago

这个问题可能是你的系统限制了每个进程能打开的文件数,需要修改系统限制。