PaddlePaddle / Paddle3D

A 3D computer vision development toolkit based on PaddlePaddle. It supports point-cloud object detection, segmentation, and monocular 3D object detection models.
Apache License 2.0
566 stars 141 forks source link

当dataloader的num_workers大于0且加载的数据为空时报错 #94

Open rkotimi opened 2 years ago

rkotimi commented 2 years ago

我用Paddle3D的smoke模型跑其他数据集,并仿照KittiDetDataset写了一个dataset类,跑到一半的时候报了很奇怪的错误。如果num_workers为0的话,就不会有这个错。

我花了很久的时间定位到了具体的问题:当load_annotation的返回为np.array([])时,就会报这个错。经过验证,我发现只要将KittiDetDataset中的172行176行注释掉,也会出现类似的错误。

我认为这是Paddle3D或者PaddlePaddle的bug,希望能给出解决方案。

命令:

python ./tools/train.py --config ./configs/smoke/smoke_dla34_no_dcn_kitti.yml --save_dir ./checkpoints/test --keep_checkpoint_max 20 --save_interval 1 --num_workers 2

报错信息:

2022-09-22 17:11:29,882 -     INFO - 
------------Environment Information-------------
platform:
    Linux-4.4.0-31-generic-x86_64-with-glibc2.17
    gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
    Python - 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0]

Science Toolkits:
    cv2 - 4.6.0
    numpy - 1.23.1
    numba - 0.56.2
    pandas - 1.4.4
    pillow - 8.3.2
    skimage - 0.19.3

PaddlePaddle:
    paddle(gpu) - 2.3.2
    paddle3d - 0.5.0
    paddleseg - 2.6.0
    FLAGS_cudnn_deterministic - Not set.
    FLAGS_cudnn_exhaustive_search - Not set.

CUDA:
    cudnn - 7605
    nvcc - Cuda compilation tools, release 10.2, V10.2.89

GPUs:
------------------------------------------------
2022-09-22 17:11:29,894 -     INFO - 
---------------Config Information---------------
batch_size: 8
iters: 70000
lr_scheduler:
  learning_rate: 0.000125
  milestones:
  - 36000
  - 55000
  type: MultiStepDecay
model:
  backbone:
    type: DLA34
  depth_ref:
  - 28.01
  - 16.32
  dim_ref:
  - - 3.88
    - 1.63
    - 1.53
  - - 1.78
    - 1.7
    - 0.58
  - - 0.88
    - 1.73
    - 0.67
  head:
    in_channels: 64
    norm_type: gn
    num_chanels: 256
    num_classes: 3
    reg_channels:
    - 1
    - 2
    - 3
    - 2
    - 2
    type: SMOKEPredictor
  max_detection: 50
  pred_2d: true
  type: SMOKE
optimizer:
  type: Adam
train_dataset:
  dataset_root: /data1/LZW/code/mmdetection3d/data/kitti/
  mode: train
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: train
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
val_dataset:
  dataset_root: /data1/LZW/code/mmdetection3d/data/kitti/
  mode: val
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: val
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
------------------------------------------------
W0922 17:11:29.896494 179757 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 10.2, Runtime API Version: 10.2
W0922 17:11:29.896520 179757 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
Process Process-1:
Traceback (most recent call last):
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 368, in _worker_loop
    six.reraise(*sys.exc_info())
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 355, in _worker_loop
    tensor_list = [
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 356, in <listcomp>
    core._array_to_share_memory_tensor(b)
RuntimeError: (Unavailable) Memory map failed when create shared memory.
  [Hint: Expected ptr != ((void *) -1), but received ptr:0xffffffffffffffff == ((void *) -1):0xffffffffffffffff.] (at /paddle/paddle/fluid/memory/allocation/mmap_allocator.cc:230)

Process Process-2:
Traceback (most recent call last):
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 368, in _worker_loop
    six.reraise(*sys.exc_info())
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 355, in _worker_loop
    tensor_list = [
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/worker.py", line 356, in <listcomp>
    core._array_to_share_memory_tensor(b)
RuntimeError: (Unavailable) Memory map failed when create shared memory.
  [Hint: Expected ptr != ((void *) -1), but received ptr:0xffffffffffffffff == ((void *) -1):0xffffffffffffffff.] (at /paddle/paddle/fluid/memory/allocation/mmap_allocator.cc:230)

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 620, in _get_data
    data = self._data_queue.get(timeout=self._timeout)
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/multiprocessing/queues.py", line 108, in get
Traceback (most recent call last):
  File "./tools/train.py", line 181, in <module>
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

    main(args)
Traceback (most recent call last):
  File "./tools/train.py", line 176, in main
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    trainer.train()
  File "/data1/LZW/code/rs_mono_3d_object_detection/paddle3d/apis/trainer.py", line 208, in train
    for sample in self.train_dataloader:
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 746, in __next__
    self.run()
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
    data = self._reader.read_next_var_list()
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 534, in _thread_loop
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

    batch = self._get_data()
  File "/data1/LZW/anaconda3/envs/paddle3d/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 635, in _get_data
    raise RuntimeError("DataLoader {} workers exit unexpectedly, " \
RuntimeError: DataLoader 2 workers exit unexpectedly, pids: 179796, 179797
rkotimi commented 2 years ago

另外,我想顺带问一个问题:PaddlePaddle在DCU上是否支持混合精度训练呢?

nepeplwu commented 2 years ago

@rkotimi 从报错信息来看,里面提示的是共享内存不足,想问下:

  1. 是否使用了GPU训练?从日志来看,没有打印出GPU设备,看起来是在CPU训练的
  2. 你本机的共享内存大小是多少呢?
nepeplwu commented 2 years ago

另外,DCU上的混合精度训练是支持的

rkotimi commented 2 years ago

@rkotimi 从报错信息来看,里面提示的是共享内存不足,想问下:

  1. 是否使用了GPU训练?从日志来看,没有打印出GPU设备,看起来是在CPU训练的
  2. 你本机的共享显存大小是多少呢?

感谢你的回复。

  1. 我用了GPU进行训练,8卡,而且当num_workers>0时,GPU利用率能够稳定在80%以上。
  2. 抱歉,我不太了解共享内存这个概念。我用free -m命令,输出结果为:
              total        used        free      shared  buff/cache   available
    Mem:         257581       39613       31850       28317      186117      187894
    Swap:          7811        4251        3560

    ipcs -m命令,输出结果为:

    
    ------ Shared Memory Segments --------
    key        shmid      owner      perms      bytes      nattch     status      
    0x00000000 65536      root       600        524288     2          dest         
    0x00000000 98305      root       600        4194304    2          dest         
    0x00000000 393218     pioneer1_5 600        393216     2          dest         
    0x00000000 491523     pioneer1_5 600        524288     2          dest         
    0x00000000 589828     pioneer1_5 600        524288     2          dest         
    0x00000000 688133     pioneer1_5 600        524288     2          dest         
    0x00000000 786438     pioneer1_5 600        393216     2          dest         
    0x00000000 884743     pioneer1_5 600        393216     2          dest         
    0x00000000 1114120    pioneer1_5 600        393216     2          dest         
    0x00000000 1146889    pioneer1_5 600        393216     2          dest         
    0x00000000 1245194    pioneer1_5 600        393216     2          dest         
    0x00000000 1540107    pioneer1_5 600        2097152    2          dest         
    0x00000000 1572876    pioneer1_5 600        524288     2          dest         
    0x00000000 1605645    pioneer1_5 600        393216     2          dest         
    0x00000000 1180827662 pioneer1_5 600        524288     2          dest         
    0x00000000 1736719    pioneer1_5 600        524288     2          dest         
    0x00000000 1835024    pioneer1_5 600        524288     2          dest         
    0x00000000 1959362577 pioneer1_2 600        393216     2          dest         
    0x00000000 1959460882 pioneer1_2 600        524288     2          dest         
    0x00000000 1959559187 pioneer1_2 600        393216     2          dest         
    0x00000000 1959657492 pioneer1_2 600        524288     2          dest         
    0x00000000 1959755797 pioneer1_2 600        524288     2          dest         
    0x00000000 1959788566 pioneer1_2 600        393216     2          dest         
    0x00000000 1960017943 pioneer1_2 600        393216     2          dest         
    0x00000000 1960050712 pioneer1_2 600        393216     2          dest         
    0x00000000 1960083481 pioneer1_2 600        393216     2          dest         
    0x00000000 169345050  pioneer1_8 600        13967360   2          dest         
    0x00000000 770277403  pioneer1_4 606        11301120   2          dest         
    0x00000000 980484124  pioneer1_8 600        524288     2          dest         
    0x00000000 1960902685 pioneer1_2 600        393216     2          dest         
    0x00000000 73236510   pioneer1_2 600        20480      2          dest         
    0x00000000 54984735   pioneer1_4 600        393216     2          dest         
    0x00000000 6520864    pioneer1_8 600        393216     2          dest         
    0x00000000 6619169    pioneer1_8 600        524288     2          dest         
    0x00000000 6717474    pioneer1_8 600        393216     2          dest         
    0x00000000 6815779    pioneer1_8 600        393216     2          dest         
    0x00000000 6979620    pioneer1_8 600        524288     2          dest         
    0x00000000 7012389    pioneer1_8 600        524288     2          dest         
    0x00000000 7176230    pioneer1_8 600        393216     2          dest         
    0x00000000 247988263  pioneer1_8 600        393216     2          dest         
    0x00000000 7307304    pioneer1_8 600        393216     2          dest         
    0x00000000 104300585  pioneer1_7 600        393216     2          dest         
    0x00000000 7569450    pioneer1_8 600        524288     2          dest         
    0x00000000 666173483  pioneer1_5 600        524288     2          dest         
    0x00000000 8159276    pioneer1_8 600        524288     2          dest         
    0x00000000 237830189  pioneer1_8 600        294912     2          dest         
    0x00000000 7897134    pioneer1_8 600        524288     2          dest         
    0x00000000 8323119    pioneer1_8 600        7168000    2          dest         
    0x00000000 169312304  pioneer1_8 600        13967360   2          dest         
    0x00000000 237862961  pioneer1_8 600        294912     2          dest         
    0x00000000 8290354    pioneer1_8 600        7168000    2          dest         
    0x00000000 8749107    pioneer1_8 600        524288     2          dest         
    0x00000000 93356084   pioneer1_8 600        8089600    2          dest         
    0x00000000 151814197  pioneer1_8 600        237568     2          dest         
    0x00000000 287342646  pioneer1_8 600        20480      2          dest         
    0x00000000 9470007    pioneer1_8 600        393216     2          dest         
    0x00000000 9338936    pioneer1_8 600        1118208    2          dest         
    0x00000000 980680761  pioneer1_4 600        524288     2          dest         
    0x00000000 16973882   pioneer1_8 600        393216     2          dest         
    0x00000000 817135675  pioneer1_2 600        13967360   2          dest         
    0x00000000 42139708   pioneer1_4 600        393216     2          dest         
    0x00000000 94634045   pioneer1_8 600        139264     2          dest         
    0x00000000 19923006   pioneer1_8 600        1118208    2          dest         
    0x00000000 42238015   pioneer1_4 600        524288     2          dest         
    0x00000000 42336320   pioneer1_4 600        524288     2          dest         
    0x00000000 42434625   pioneer1_4 600        524288     2          dest         
    0x00000000 42598466   pioneer1_4 600        393216     2          dest         
    0x00000000 42631235   pioneer1_4 600        393216     2          dest         
    0x00000000 42795076   pioneer1_4 600        393216     2          dest         
    0x00000000 667680837  pioneer1_4 600        393216     2          dest         
    0x00000000 42860614   pioneer1_4 600        393216     2          dest         
    0x00000000 770244679  pioneer1_4 606        11301120   2          dest         
    0x00000000 813465672  pioneer1_4 600        393216     2          dest         
    0x00000000 43352137   pioneer1_4 600        393216     2          dest         
    0x00000000 43974730   pioneer1_4 600        524288     2          dest         
    0x00000000 43548747   pioneer1_4 600        524288     2          dest         
    0x00000000 43647052   pioneer1_4 600        524288     2          dest         
    0x00000000 1961132109 pioneer1_2 600        524288     2          dest         
    0x00000000 899022926  pioneer1_4 600        393216     2          dest         
    0x00000000 169279567  pioneer1_8 600        4194304    2          dest         
    0x00000000 44073040   pioneer1_4 606        2880000    2          dest         
    0x00000000 44105809   pioneer1_4 606        2880000    2          dest         
    0x00000000 55017554   pioneer1_4 600        524288     2          dest         
    0x00000000 44597331   pioneer1_4 600        524288     2          dest         
    0x00000000 238518356  pioneer1_8 600        524288     2          dest         
    0x00000000 88735829   pioneer1_7 600        524288     2          dest         
    0x00000000 316145750  amax       600        524288     2          dest         
    0x00000000 241926231  pioneer1_8 600        524288     2          dest         
    0x00000000 102400088  pioneer1_7 600        393216     2          dest         
    0x00000000 75333721   amax       600        393216     2          dest         
    0x00000000 75432026   amax       600        524288     2          dest         
    0x00000000 75530331   amax       600        524288     2          dest         
    0x00000000 75628636   amax       600        524288     2          dest         
    0x00000000 75726941   amax       600        393216     2          dest         
    0x00000000 75890782   amax       600        393216     2          dest         
    0x00000000 75989087   amax       600        393216     2          dest         
    0x00000000 76021856   amax       600        393216     2          dest         
    0x00000000 76054625   amax       600        393216     2          dest         
    0x00000000 77693026   amax       600        524288     2          dest         
    0x00000000 76382307   amax       600        393216     2          dest         
    0x00000000 76415076   amax       600        524288     2          dest         
    0x00000000 76873829   amax       600        524288     2          dest         
    0x00000000 76972134   amax       600        393216     2          dest         
    0x00000000 77922407   amax       606        10840320   2          dest         
    0x00000000 76841064   amax       600        524288     2          dest         
    0x00000000 77070441   amax       600        524288     2          dest         
    0x00000000 77201514   amax       600        524288     2          dest         
    0x00000000 77791339   amax       600        524288     2          dest         
    0x00000000 77955180   amax       606        10840320   2          dest         
    0x00000000 77987949   amax       606        2880000    2          dest         
    0x00000000 77889646   amax       600        2097152    2          dest         
    0x00000000 78020719   amax       606        2880000    2          dest         
    0x00000000 1961164912 pioneer1_2 600        393216     2          dest         
    0x00000000 287375473  pioneer1_8 600        20480      2          dest         
    0x00000000 1960706162 pioneer1_2 600        524288     2          dest         
    0x00000000 78446707   amax       600        393216     2          dest         
    0x00000000 93323380   pioneer1_8 600        8089600    2          dest         
    0x00000000 1960738933 pioneer1_2 600        393216     2          dest         
    0x00000000 1961197686 pioneer1_2 600        524288     2          dest         
    0x00000000 238551159  pioneer1_8 606        4718592    2          dest         
    0x00000000 93454456   pioneer1_8 600        90112      2          dest         
    0x00000000 93487225   pioneer1_8 600        90112      2          dest         
    0x00000000 1554153594 pioneer1_8 600        61440      2          dest         
    0x00000000 93847675   pioneer1_8 600        53248      2          dest         
    0x00000000 238583932  pioneer1_8 606        4718592    2          dest         
    0x00000000 93880445   pioneer1_8 600        53248      2          dest         
    0x00000000 102498430  pioneer1_7 600        524288     2          dest         
    0x00000000 102596735  pioneer1_7 600        393216     2          dest         
    0x00000000 102695040  pioneer1_7 600        524288     2          dest         
    0x00000000 94601345   pioneer1_8 600        139264     2          dest         
    0x00000000 102793346  pioneer1_7 600        524288     2          dest         
    0x00000000 151781507  pioneer1_8 600        237568     2          dest         
    0x00000000 102957188  pioneer1_7 600        393216     2          dest         
    0x00000000 102989957  pioneer1_7 600        393216     2          dest         
    0x00000000 103088262  pioneer1_7 600        393216     2          dest         
    0x00000000 238616711  pioneer1_8 606        2880000    2          dest         
    0x00000000 103219336  pioneer1_7 600        393216     2          dest         
    0x00000000 238649481  pioneer1_8 606        2880000    2          dest         
    0x00000000 2039611530 pioneer1_5 600        524288     2          dest         
    0x00000000 820412555  pioneer1_2 600        151552     2          dest         
    0x00000000 103678092  pioneer1_7 600        524288     2          dest         
    0x00000000 103841933  pioneer1_7 600        524288     2          dest         
    0x00000000 103874702  pioneer1_7 600        524288     2          dest         
    0x00000000 1180565647 pioneer1_5 600        393216     2          dest         
    0x00000000 292814992  pioneer1_8 600        229376     2          dest         
    0x00000000 288194705  pioneer1_8 600        425984     2          dest         
    0x00000000 288227474  pioneer1_8 600        425984     2          dest         
    0x00000000 248250515  pioneer1_8 600        393216     2          dest         
    0x00000000 74907796   pioneer1_2 600        139264     2          dest         
    0x00000000 2054324373 pioneer1_4 600        1048576    2          dest         
    0x00000000 1961296022 pioneer1_2 600        4194304    2          dest         
    0x00000000 1915093143 pioneer1_4 600        524288     2          dest         
    0x00000000 819363992  pioneer1_7 600        151552     2          dest         
    0x00000000 288489625  pioneer1_8 600        45056      2          dest         
    0x00000000 930185370  pioneer1_7 600        524288     2          dest         
    0x00000000 288456859  pioneer1_8 600        45056      2          dest         
    0x00000000 292782236  pioneer1_8 600        229376     2          dest         
    0x00000000 103088285  pioneer1_2 600        524288     2          dest         
    0x00000000 172163230  pioneer1_8 600        524288     2          dest         
    0x00000000 899154079  pioneer1_4 600        393216     2          dest         
    0x00000000 173441184  pioneer1_7 600        24576      2          dest         
    0x00000000 173768865  pioneer1_8 600        524288     2          dest         
    0x00000000 164495522  pioneer1_8 600        393216     2          dest         
    0x00000000 629833891  pioneer1_4 600        393216     2          dest         
    0x00000000 667943076  pioneer1_4 600        393216     2          dest         
    0x00000000 1198882981 pioneer1_5 600        1048576    2          dest         
    0x00000000 511049894  pioneer1_7 600        16384      2          dest         
    0x00000000 819396775  pioneer1_7 600        151552     2          dest         
    0x00000000 980582568  pioneer1_7 600        524288     2          dest         
    0x00000000 548372649  pioneer1_4 600        393216     2          dest         
    0x00000000 225083562  pioneer1_4 600        16777216   2          dest         
    0x00000000 88834219   pioneer1_7 600        4194304    2          dest         
    0x00000000 173473964  pioneer1_7 600        24576      2          dest         
    0x00000000 174129325  pioneer1_7 600        139264     2          dest         
    0x00000000 316178606  amax       600        4194304    2          dest         
    0x00000000 199491759  pioneer1_4 600        4194304    2          dest         
    0x00000000 820445360  pioneer1_2 600        151552     2          dest         
    0x00000000 373162161  pioneer1_7 600        13967360   2          dest         
    0x00000000 1963819186 pioneer1_2 600        524288     2          dest         
    0x00000000 1581711539 pioneer1_8 600        225280     2          dest         
    0x00000000 1538621620 pioneer1_8 600        24576      2          dest         
    0x00000000 2114584757 pioneer1_5 600        46312      2          dest         
    0x00000000 1358725302 pioneer1_8 600        393216     2          dest         
    0x00000000 1581678775 pioneer1_8 600        225280     2          dest         
    0x00000000 1558151352 pioneer1_8 600        151552     2          dest         
    0x00000000 233963705  pioneer1_8 600        18128      2          dest         
    0x00000000 1538654394 pioneer1_8 600        24576      2          dest         
    0x00000000 73269435   pioneer1_2 600        20480      2          dest         
    0x00000000 1557266620 pioneer1_8 600        28672      2          dest         
    0x00000000 1554186429 pioneer1_8 600        61440      2          dest         
    0x00000000 169705662  pioneer1_8 600        16384      2          dest         
    0x00000000 1555628223 pioneer1_8 600        1118208    2          dest         
    0x00000000 1555660992 pioneer1_8 600        1118208    2          dest         
    0x00000000 1557299393 pioneer1_8 600        28672      2          dest         
    0x00000000 1558118594 pioneer1_8 600        151552     2          dest         
    0x00000000 174162115  pioneer1_7 600        139264     2          dest         
    0x00000000 73400516   pioneer1_2 600        32768      2          dest         
    0x00000000 73433285   pioneer1_2 600        32768      2          dest         
    0x00000000 73564358   pioneer1_2 600        417792     2          dest         
    0x00000000 73597127   pioneer1_2 600        425984     2          dest         
    0x00000000 548405448  pioneer1_4 600        12288      2          dest         
    0x00000000 548438217  pioneer1_4 600        393216     2          dest         
    0x00000000 174817482  pioneer1_7 600        139264     2          dest         
    0x00000000 74875083   pioneer1_2 600        139264     2          dest         
    0x00000000 95158476   pioneer1_2 600        53248      2          dest         
    0x00000000 91324621   pioneer1_2 600        110592     2          dest         
    0x00000000 169083086  pioneer1_8 600        180224     2          dest         
    0x00000000 168886479  pioneer1_8 600        425984     2          dest         
    0x00000000 511082704  pioneer1_7 600        16384      2          dest         
    0x00000000 548536529  pioneer1_4 600        524288     2          dest         
    0x00000000 813433042  pioneer1_4 600        524288     2          dest         
    0x00000000 169672915  pioneer1_8 600        16384      2          dest         
    0x00000000 95191252   pioneer1_2 600        53248      2          dest         
    0x00000000 168919253  pioneer1_8 600        425984     2          dest         
    0x00000000 89325782   pioneer1_2 600        106496     2          dest         
    0x00000000 89358551   pioneer1_2 600        106496     2          dest         
    0x00000000 169050328  pioneer1_8 600        180224     2          dest         
    0x00000000 91291865   pioneer1_2 600        110592     2          dest         
    0x00000000 979828954  pioneer1_2 600        62464      2          dest         
    0x00000000 174784731  pioneer1_7 600        139264     2          dest         
    0x00000000 980779228  pioneer1_5 600        524288     2          dest         
    0x00000000 813564125  pioneer1_4 600        393216     2          dest         
    0x00000000 821526750  pioneer1_2 600        524288     2          dest         
    0x00000000 980386015  amax       600        524288     2          dest         
    0x00000000 509608160  pioneer1_2 600        24576      2          dest         
    0x00000000 509640929  pioneer1_2 600        24576      2          dest         
    0x00000000 817201378  pioneer1_2 600        13967360   2          dest         
    0x00000000 819069155  pioneer1_7 600        13967360   2          dest         
    0x00000000 965050597  pioneer1_2 600        524288     2          dest         
nepeplwu commented 2 years ago

@rkotimi 试试打印下 df -h 看看

Luna2199 commented 1 year ago

num_workers >0会遇到同样的问题,只有num_workers=0时才可以正常训练。

nepeplwu commented 7 months ago

@Luna2199 你的运行命令和报错日志麻烦发下