I0918 23:46:05.051712 973133 tcp_utils.cc:130] Successfully connected to 127.0.0.1:52457
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
W0918 23:46:24.526521 973133 dygraph_functions.cc:83150] got different data type, run type promotion automatically, this may cause data type been changed.
FatalError: Segmentation fault is detected by the operating system.
[TimeInfo: Aborted at 1726674458 (unix time) try "date -d @1726674458" if you are using GNU date ]
LAUNCH INFO 2024-09-18 23:47:48,695 Exit code -11
[SignalInfo: SIGSEGV (@0xed94d) received by PID 973133 (TID 0xffffa00bae90) from PID 973133 ]
Traceback (most recent call last):
File "/work/workspace/PaddleX/paddlex/utils/result_saver.py", line 30, in wrap
result = func(self, *args, kwargs)
File "/work/workspace/PaddleX/paddlex/engine.py", line 42, in run
trainer.train()
File "/work/workspace/PaddleX/paddlex/modules/base/trainer/trainer.py", line 61, in train
train_result = self.pdx_model.train(self.get_train_kwargs())
File "/work/workspace/PaddleX/paddlex/repo_apis/PaddleDetection_api/object_det/model.py", line 109, in train
return self.runner.train(
File "/work/workspace/PaddleX/paddlex/repo_apis/PaddleDetection_api/object_det/runner.py", line 54, in train
return self.run_cmd(
File "/work/workspace/PaddleX/paddlex/repo_apis/base/runner.py", line 359, in run_cmd
raise CalledProcessError(
paddlex.utils.errors.others.CalledProcessError: Command ['/usr/bin/python', '-m', 'paddle.distributed.launch', '--devices', '0,1,2,3', '--log_dir', '/work/workspace/PaddleX/ppyolo_plus_s_output/distributed_train_logs', 'tools/train.py', '--eval', '--config', '/root/.paddlex/tmp99soy5_c/detmodel_PP-YOLOE_plus-S.yml', '--use_vdl', 'True', '--vdl_log_dir', '/work/workspace/PaddleX/ppyolo_plus_s_output'] returned non-zero exit status 245.
Checklist:
描述问题
PaddleX 支持对数据集进行校验,确保数据集格式符合 PaddleX 的相关要求。同时在数据校验时,能够对数据集进行分析,统计数据集的基本信息。
python main.py -c paddlex/configs/object_detection/PP-YOLOE_plus-S.yaml \ -o Global.mode=check_dataset \ -o Global.dataset_dir=./dataset/det_coco_examples
成功
复现
您是否已经正常运行我们提供的教程?
您是否在教程的基础上修改代码内容?还请您提供运行的代码 python main.py -c paddlex/configs/object_detection/PP-YOLOE_plus-S.yaml \ -o Global.mode=train \ -o Global.dataset_dir=./dataset/det_coco_examples \ -o Global.output=ppyolo_plus_s_output \ -o Global.device="npu:0,1,2,3"
您使用的数据集是?
请提供您出现的报错信息及相关log ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_use_stride_kernel', current_value=False, default_value=True)
I0918 23:46:05.051712 973133 tcp_utils.cc:130] Successfully connected to 127.0.0.1:52457 loading annotations into memory... Done (t=0.00s) creating index... index created! W0918 23:46:24.526521 973133 dygraph_functions.cc:83150] got different data type, run type promotion automatically, this may cause data type been changed.
\
C++ Traceback (most recent call last):
0 egr::Backward(std::vector<paddle::Tensor, std::allocator > const&, std::vector<paddle::Tensor, std::allocator > const&, bool)
1 egr::RunBackward(std::vector<paddle::Tensor, std::allocator > const&, std::vector<paddle::Tensor, std::allocator > const&, bool, bool, std::vector<paddle::Tensor, std::allocator > const&, bool, std::vector<paddle::Tensor, std::allocator > const&)
2 Conv2dGradNodeFinal::operator()(paddle::small_vector<std::vector<paddle::Tensor, std::allocator >, 15u>&, bool, bool)
3 paddle::experimental::conv2d_grad(paddle::Tensor const&, paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, paddle::Tensor, paddle::Tensor)
4 void custom_kernel::Conv2DGradKernel<float, phi::CustomContext>(phi::CustomContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor, phi::DenseTensor)
5 aclnnConvolutionBackward
6 InitL2Phase2Context(char const, aclOpExecutor)
7 GetOpExecCacheFromExecutor(aclOpExecutor*)
Error Message Summary:
FatalError:
Segmentation fault
is detected by the operating system. [TimeInfo: Aborted at 1726674458 (unix time) try "date -d @1726674458" if you are using GNU date ] LAUNCH INFO 2024-09-18 23:47:48,695 Exit code -11 [SignalInfo: SIGSEGV (@0xed94d) received by PID 973133 (TID 0xffffa00bae90) from PID 973133 ]Traceback (most recent call last): File "/work/workspace/PaddleX/paddlex/utils/result_saver.py", line 30, in wrap result = func(self, *args, kwargs) File "/work/workspace/PaddleX/paddlex/engine.py", line 42, in run trainer.train() File "/work/workspace/PaddleX/paddlex/modules/base/trainer/trainer.py", line 61, in train train_result = self.pdx_model.train(self.get_train_kwargs()) File "/work/workspace/PaddleX/paddlex/repo_apis/PaddleDetection_api/object_det/model.py", line 109, in train return self.runner.train( File "/work/workspace/PaddleX/paddlex/repo_apis/PaddleDetection_api/object_det/runner.py", line 54, in train return self.run_cmd( File "/work/workspace/PaddleX/paddlex/repo_apis/base/runner.py", line 359, in run_cmd raise CalledProcessError( paddlex.utils.errors.others.CalledProcessError: Command ['/usr/bin/python', '-m', 'paddle.distributed.launch', '--devices', '0,1,2,3', '--log_dir', '/work/workspace/PaddleX/ppyolo_plus_s_output/distributed_train_logs', 'tools/train.py', '--eval', '--config', '/root/.paddlex/tmp99soy5_c/detmodel_PP-YOLOE_plus-S.yml', '--use_vdl', 'True', '--vdl_log_dir', '/work/workspace/PaddleX/ppyolo_plus_s_output'] returned non-zero exit status 245.
环境
请提供您使用的PaddlePaddle和PaddleX的版本号 3.0-beta
请提供您使用的操作系统信息,如Linux/Windows/MacOS
请问您使用的Python版本是?
请问您使用的CUDA/cuDNN的版本号是?