PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle(飞桨一站式全流程开发工具)
Apache License 2.0
4.77k stars 938 forks source link

训练碰到的问题 #1662

Open Phoenix8215 opened 1 year ago

Phoenix8215 commented 1 year ago

Checklist:

  1. 查找历史相关issue寻求解答
  2. 翻阅FAQ常见问题汇总和答疑
  3. 确认bug是否在新版本里还未修复
  4. 翻阅PaddleX API文档说明

描述问题

按照教程尝试训练报错

root@autodl-container-14dc11ad52-d244ef51:~/autodl-tmp/Person/DATA# python3 demo.py 
WARNING: OMP_NUM_THREADS set to 12, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:36: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': Image.NEAREST,
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:37: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': Image.BILINEAR,
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': Image.BICUBIC,
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:39: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  'box': Image.BOX,
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:40: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  'lanczos': Image.LANCZOS,
/root/miniconda3/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:41: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  'hamming': Image.HAMMING
/root/miniconda3/lib/python3.8/site-packages/paddlex/paddleseg/models/losses/decoupledsegnet_relax_boundary_loss.py:19: DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift
/root/miniconda3/lib/python3.8/site-packages/paddlex/paddleseg/transforms/functional.py:18: DeprecationWarning: Please use `distance_transform_edt` from the `scipy.ndimage` namespace, the `scipy.ndimage.morphology` namespace is deprecated.
  from scipy.ndimage.morphology import distance_transform_edt
/root/miniconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py:13: DeprecationWarning: Please use `spmatrix` from the `scipy.sparse` namespace, the `scipy.sparse.base` namespace is deprecated.
  from scipy.sparse.base import spmatrix
/root/miniconda3/lib/python3.8/site-packages/paddlex/ppcls/data/preprocess/ops/timm_autoaugment.py:38: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
/root/miniconda3/lib/python3.8/site-packages/paddlex/ppcls/data/preprocess/ops/timm_autoaugment.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
/root/miniconda3/lib/python3.8/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
loading annotations into memory...
Done (t=5.85s)
creating index...
index created!
2023-02-08 21:45:33 [INFO]      Starting to read file list from dataset...
2023-02-08 21:45:36 [INFO]      19404 samples in file ./instances_train.json, including 19404 positive samples and 0 negative samples.
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
2023-02-08 21:45:36 [INFO]      Starting to read file list from dataset...
2023-02-08 21:45:37 [INFO]      4000 samples in file ./instances_val.json, including 4000 positive samples and 0 negative samples.
W0208 21:45:37.300204 149630 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.6, Runtime API Version: 10.2
W0208 21:45:37.304487 149630 device_context.cc:465] device: 0, cuDNN Version: 8.1.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1675863963 (unix time) try "date -d @1675863963" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 149630 (TID 0x7f8fb2d3b3c0) from PID 0 ***]

其中demo.py内容如下

import paddlex as pdx
from paddlex import transforms as T

# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/release/2.0-rc/paddlex/cv/transforms/operators.py
train_transforms = T.Compose([
    T.MixupImage(mixup_epoch=-1), T.RandomDistort(),
    T.RandomExpand(im_padding_value=[123.675, 116.28, 103.53]), T.RandomCrop(),
    T.RandomHorizontalFlip(), T.BatchRandomResize(
        target_sizes=[640, 672, 704,736, 768,800,832,864,896,928,960,992],
        interp='RANDOM'), T.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

eval_transforms = T.Compose([
    T.Resize(
        992, interp='CUBIC'), T.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 下载和解压表计检测数据集,如果已经预先下载,可注释掉下面两行

# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/cv/datasets/coco.py#L26
train_dataset = pdx.datasets.CocoDetection(
    data_dir='./train/images',
    ann_file='./instances_train.json',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.CocoDetection(
    data_dir='./test/images',
    ann_file='./instances_val.json',
    transforms=eval_transforms)

# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://github.com/PaddlePaddle/PaddleX/tree/release/2.0-rc/tutorials/train#visualdl可视化训练指标
num_classes = len(train_dataset.labels)
model = pdx.det.PPYOLOv2(num_classes=num_classes, backbone='ResNet50_vd_dcn')

# API说明:https://github.com/PaddlePaddle/PaddleX/blob/release/2.0-rc/paddlex/cv/models/detector.py#L155
# 各参数介绍与调整说明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
model.train(
    num_epochs=300,
    train_dataset=train_dataset,
    train_batch_size=8,
    eval_dataset=eval_dataset,
    pretrain_weights=None,
    learning_rate=0.005 / 12,
    warmup_steps=1000,
    warmup_start_lr=0.0,
    lr_decay_epochs=[105, 135, 150,200],
    save_interval_epochs=60,
    save_dir='output/ppyolov2_r50vd_dcn',
    early_stop="True",
    early_stop_patience=40)

复现

  1. 您是否已经正常运行我们提供的教程

  2. 您是否在教程的基础上修改代码内容?还请您提供运行的代码

  3. 您使用的数据集是?

  4. 请提供您出现的报错信息及相关log

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 paddlex 2.1.0
  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Linux
  3. 请问您使用的Python版本是? Python 3.8.10 (default, Jun 4 2021, 15:09:15)
  4. 请问您使用的CUDA/cuDNN的版本号是? 11.2
lailuboy commented 1 year ago

paddlepaddle版本是多少,还有请发一下nvidia-smi结果