train error in aistuido project with quick start steps

unseenme commented 2 years ago

command

!python tools/train.py --config configs/smoke/smoke_dla34_no_dcn_kitti.yml --iters 100 --log_interval 10 --save_interval 20

log

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
2022-09-23 11:26:08,720 -     INFO - 
------------Environment Information-------------
platform:
    Linux-4.15.0-140-generic-x86_64-with-debian-stretch-sid
    gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
    Python - 3.7.4 (default, Aug 13 2019, 20:35:49)  [GCC 7.3.0]

Science Toolkits:
    cv2 - 4.1.1
    numpy - 1.19.5
    numba - 0.48.0
    pandas - 1.1.5
    pillow - 8.2.0
    skimage - 0.19.3

PaddlePaddle:
    paddle(gpu) - 2.3.2
    paddle3d - 0.5.0
    paddleseg - 2.6.0
    FLAGS_cudnn_deterministic - Not set.
    FLAGS_cudnn_exhaustive_search - Not set.

CUDA:
    cudnn - 8200
    nvcc - Build cuda_11.2.r11.2/compiler.29618528_0

GPUs:
------------------------------------------------
2022-09-23 11:26:08,728 -     INFO - 
---------------Config Information---------------
batch_size: 8
iters: 100
lr_scheduler:
  learning_rate: 0.000125
  milestones:
  - 36000
  - 55000
  type: MultiStepDecay
model:
  backbone:
    pretrained: https://bj.bcebos.com/paddle3d/pretrained/dla34.pdparams
    type: DLA34
  depth_ref:
  - 28.01
  - 16.32
  dim_ref:
  - - 3.88
    - 1.63
    - 1.53
  - - 1.78
    - 1.7
    - 0.58
  - - 0.88
    - 1.73
    - 0.67
  head:
    in_channels: 64
    norm_type: gn
    num_chanels: 256
    num_classes: 3
    reg_channels:
    - 1
    - 2
    - 3
    - 2
    - 2
    type: SMOKEPredictor
  max_detection: 50
  pred_2d: true
  type: SMOKE
optimizer:
  type: Adam
train_dataset:
  dataset_root: datasets/KITTI
  mode: train
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: train
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
val_dataset:
  dataset_root: datasets/KITTI
  mode: val
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: val
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
------------------------------------------------
W0923 11:26:08.731779  3072 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0923 11:26:08.731842  3072 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-09-23 11:26:10,082 -     INFO - download pretrained model from https://bj.bcebos.com/paddle3d/pretrained/dla34.pdparams
2022-09-23 11:26:12,115 -     INFO - ###############################################] 100.00%
2022-09-23 11:26:12,287 -     INFO - There are 189/189 variables loaded into DLA.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1663903574 (unix time) try "date -d @1663903574" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 3072 (TID 0x7f6807a9c700) from PID 0 ***]

Paddle3D version: 3bb36db804bb65a0dd8ab3b5dac4cffb9d370d18

nepeplwu commented 2 years ago

@unseenme 感谢反馈，你用的数据集是什么数据集呢？aistudio的教程是否方便发给我们看看？

unseenme commented 2 years ago

@nepeplwu 感谢答复。数据集：kitti精选 https://aistudio.baidu.com/aistudio/datasetdetail/165771 教程：https://github.com/PaddlePaddle/Paddle3D/blob/develop/docs/quickstart.md

nepeplwu commented 2 years ago

@unseenme 我这边在aistudio上创建教程后可以正常运行，感觉应该是你配置的环境问题：

我的项目：https://aistudio.baidu.com/aistudio/projectdetail/4613493?sUid=40015&shared=1&ts=1664350835870 （需要自行把kitti300frame放到datasets/KITTI目录下）

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
2022-09-28 15:33:10,514 -     INFO - 
------------Environment Information-------------
platform:
    Linux-4.15.0-140-generic-x86_64-with-debian-stretch-sid
    gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
    Python - 3.7.4 (default, Aug 13 2019, 20:35:49)  [GCC 7.3.0]

Science Toolkits:
    cv2 - 4.1.1
    numpy - 1.19.5
    numba - 0.48.0
    pandas - 1.1.5
    pillow - 8.2.0
    skimage - 0.19.3

PaddlePaddle:
    paddle(gpu) - 2.3.2
    paddle3d - 0.5.0
    paddleseg - 2.6.0
    FLAGS_cudnn_deterministic - Not set.
    FLAGS_cudnn_exhaustive_search - Not set.

CUDA:
    cudnn - 8200
    nvcc - Build cuda_11.2.r11.2/compiler.29618528_0

GPUs:
------------------------------------------------
2022-09-28 15:33:10,519 -     INFO - 
---------------Config Information---------------
batch_size: 8
iters: 100
lr_scheduler:
  learning_rate: 0.000125
  milestones:
  - 36000
  - 55000
  type: MultiStepDecay
model:
  backbone:
    pretrained: https://bj.bcebos.com/paddle3d/pretrained/dla34.pdparams
    type: DLA34
  depth_ref:
  - 28.01
  - 16.32
  dim_ref:
  - - 3.88
    - 1.63
    - 1.53
  - - 1.78
    - 1.7
    - 0.58
  - - 0.88
    - 1.73
    - 0.67
  head:
    in_channels: 64
    norm_type: gn
    num_chanels: 256
    num_classes: 3
    reg_channels:
    - 1
    - 2
    - 3
    - 2
    - 2
    type: SMOKEPredictor
  max_detection: 50
  pred_2d: true
  type: SMOKE
optimizer:
  type: Adam
train_dataset:
  dataset_root: datasets/KITTI
  mode: train
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: train
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
val_dataset:
  dataset_root: datasets/KITTI
  mode: val
  transforms:
  - reader: pillow
    to_chw: false
    type: LoadImage
  - input_size:
    - 1280
    - 384
    mode: val
    num_classes: 3
    type: Gt2SmokeTarget
  - mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
    type: Normalize
  type: KittiMonoDataset
------------------------------------------------
W0928 15:33:10.521813  2643 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0928 15:33:10.521857  2643 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-09-28 15:33:11,661 -     INFO - download pretrained model from https://bj.bcebos.com/paddle3d/pretrained/dla34.pdparams
2022-09-28 15:33:13,981 -     INFO - ###############################################] 100.00%
2022-09-28 15:33:14,152 -     INFO - There are 189/189 variables loaded into DLA.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int32, the right dtype will convert to paddle.float32
  format(lhs_dtype, rhs_dtype, lhs_dtype))
2022-09-28 15:33:27,329 -     INFO - [TRAIN] epoch=1/4, iter=10/100, loss=39.264753, lr=0.000125 | ETA 00:01:41
2022-09-28 15:33:38,640 -     INFO - [TRAIN] epoch=1/4, iter=20/100, loss=13.589402, lr=0.000125 | ETA 00:01:43
2022-09-28 15:33:41,061 -     INFO - Push model to checkpoint ./output/iter_20
2022-09-28 15:33:48,497 -     INFO - [TRAIN] epoch=1/4, iter=30/100, loss=15.200662, lr=0.000125 | ETA 00:01:02
2022-09-28 15:33:58,729 -     INFO - [TRAIN] epoch=2/4, iter=40/100, loss=13.075685, lr=0.000125 | ETA 00:00:48
2022-09-28 15:34:00,844 -     INFO - Push model to checkpoint ./output/iter_40
2022-09-28 15:34:08,940 -     INFO - [TRAIN] epoch=2/4, iter=50/100, loss=14.198670, lr=0.000125 | ETA 00:00:40
2022-09-28 15:34:18,737 -     INFO - [TRAIN] epoch=2/4, iter=60/100, loss=11.373871, lr=0.000125 | ETA 00:00:34
2022-09-28 15:34:19,895 -     INFO - Push model to checkpoint ./output/iter_60
2022-09-28 15:34:29,327 -     INFO - [TRAIN] epoch=3/4, iter=70/100, loss=10.898486, lr=0.000125 | ETA 00:00:24
2022-09-28 15:34:38,547 -     INFO - [TRAIN] epoch=3/4, iter=80/100, loss=10.830395, lr=0.000125 | ETA 00:00:16
2022-09-28 15:34:40,651 -     INFO - Push model to checkpoint ./output/iter_80
2022-09-28 15:34:50,533 -     INFO - [TRAIN] epoch=3/4, iter=90/100, loss=11.886550, lr=0.000125 | ETA 00:00:10
2022-09-28 15:35:02,145 -     INFO - [TRAIN] epoch=4/4, iter=100/100, loss=9.577602, lr=0.000125 | ETA 00:00:00
2022-09-28 15:35:03,841 -     INFO - Push model to checkpoint ./output/iter_100
2022-09-28 15:35:03,848 -     INFO - Training is complete.

unseenme commented 2 years ago

@nepeplwu 感谢给出aistudio工程。我Fork了一下，训练可以执行了。但是评估报错。

项目“Paddle3D smoke_副本”共享链接(有效期三天)：https://aistudio.baidu.com/studio/project/partial/verify/4624706/6f6fd7c74d7047c5a2d690675591b8b2

!wget https://paddle3d.bj.bcebos.com/models/smoke/smoke_dla34_no_dcn_kitti/model.pdparams
!python tools/evaluate.py --config configs/smoke/smoke_dla34_no_dcn_kitti.yml --model model.pdparams --batch_size 1

--2022-09-30 01:11:06--  https://paddle3d.bj.bcebos.com/models/smoke/smoke_dla34_no_dcn_kitti/model.pdparams
正在解析主机 paddle3d.bj.bcebos.com (paddle3d.bj.bcebos.com)... 220.181.33.44, 220.181.33.43, 2409:8c04:1001:1002:0:ff:b001:368a
正在连接 paddle3d.bj.bcebos.com (paddle3d.bj.bcebos.com)|220.181.33.44|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度： 75512268 (72M) [application/octet-stream]
正在保存至: “model.pdparams”

model.pdparams      100%[===================>]  72.01M  39.5MB/s    in 1.8s    

2022-09-30 01:11:08 (39.5 MB/s) - 已保存 “model.pdparams” [75512268/75512268])

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
W0930 01:11:11.897845   570 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0930 01:11:11.902846   570 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-09-30 01:11:13,074 -  WARNING - There is a file with the same name locally, we directly load the local file
2022-09-30 01:11:13,265 -     INFO - There are 189/189 variables loaded into DLA.
2022-09-30 01:11:13,403 -     INFO - There are 201/201 variables loaded into SMOKE.
2022-09-30 01:11:13,503 -     INFO - evaluate on validate dataset
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.int32, the right dtype will convert to paddle.int64
  format(lhs_dtype, rhs_dtype, lhs_dtype))
2022-09-30 01:11:23,275 -     INFO - ###############################################] 100.00%
Traceback (most recent call last):
  File "tools/evaluate.py", line 99, in <module>
    main(args)
  File "tools/evaluate.py", line 94, in main
    trainer.evaluate()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle3d/apis/trainer.py", line 316, in evaluate
    metrics = metric_obj.compute(verbose=True)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle3d/datasets/kitti/kitti_metric.py", line 160, in compute
    metric_types=["bbox", "bev", "3d"])
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle3d/thirdparty/__init__.py", line 11, in kitti_eval
    from paddle3d.thirdparty.kitti_object_eval_python.eval import get_official_eval_result
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle3d/thirdparty/kitti_object_eval_python/eval.py", line 9, in <module>
    from .rotate_iou import rotate_iou_gpu_eval
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle3d/thirdparty/kitti_object_eval_python/rotate_iou.py", line 262, in <module>
    def rotate_iou_kernel_eval(N, K, dev_boxes, dev_query_boxes, dev_iou, criterion=-1):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
    kernel.bind()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/compiler.py", line 548, in bind
    self._func.get()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/compiler.py", line 426, in get
    ptx = self.ptx.get()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/compiler.py", line 397, in get
    **self._extra_options)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 496, in llvm_to_ptx
    ptx = cu.compile(**opts)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 233, in compile
    self._try_error(err, 'Failed to compile\n')
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 251, in _try_error
    self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 141, in check_error
    raise exc
numba.cuda.cudadrv.error.NvvmError: Failed to compile

<unnamed> (66, 23): parse expected comma after load's type
NVVM_ERROR_COMPILATION

nepeplwu commented 9 months ago

@unseenme 这个问题通过升级numba可以解决

python -m pip install -U numba

PaddlePaddle / Paddle3D

train error in aistuido project with quick start steps #97