PaddlePaddle / Paddle3D

A 3D computer vision development toolkit based on PaddlePaddle. It supports point-cloud object detection, segmentation, and monocular 3D object detection models.
Apache License 2.0
553 stars 135 forks source link

RuntimeError: (Unavailable) Memory map failed when create shared memory. #326

Open kamiLight opened 1 year ago

kamiLight commented 1 year ago

错误信息: Process Process-6: Traceback (most recent call last): File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 371, in _worker_loop six.reraise(sys.exc_info()) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 358, in _worker_loop tensor_list = [ File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 359, in core._array_to_share_memory_tensor(b) RuntimeError: (Unavailable) Memory map failed when create shared memory. [Hint: Expected ptr != ((void ) -1), but received ptr:0xffffffffffffffff == ((void ) -1):0xffffffffffffffff.] (at /paddle/paddle/fluid/memory/allocation/mmap_allocator.cc:246)

Traceback (most recent call last): File "tools/train.py", line 202, in main(args) File "tools/train.py", line 197, in main trainer.train() File "/home/yanjiaxing/baidu/paddle3D/Paddle3D/paddle3d/apis/trainer.py", line 284, in train output = training_step( File "/home/yanjiaxing/baidu/paddle3D/Paddle3D/paddle3d/apis/pipeline.py", line 68, in training_step loss.backward() File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/wrapped_decorator.py", line 26, in impl return wrapped_func(*args, *kwargs) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/framework.py", line 534, in impl return func(args, kwargs) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 297, in backward core.eager.run_backward([self], grad_tensor, retain_graph) File "/home/yanjiaxing/anaconda3/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/multiprocess_utils.py", line 135, in handler core._throw_error_if_process_failed() SystemError: (Fatal) DataLoader process (pid 12512) exited unexpectedly with code 1. Error detailed are lost due to multiprocessing. Rerunning with:

  1. If run DataLoader by DataLoader.from_generator(...), run with DataLoader.from_generator(..., use_multiprocess=False) may give better error trace.
  2. If run DataLoader by DataLoader(dataset, ...), run with DataLoader(dataset, ..., num_workers=0) may give better error trace (at /paddle/paddle/fluid/imperative/data_loader.cc:161)
LielinJiang commented 5 months ago

应该是share momery不够,观察一下/dev/shm该路径的存储大小。