paddlepaddle-gpu 2.0.0rc1报FatalError: `Segmentation fault` is detected by the operating system. #1637

Closed xiulianzw closed 2 years ago

xiulianzw commented 3 years ago

用的git上的最新版的PaddleOCR,在执行python tools/infer/predict_system.py报错,错误信息如下:

C++ Traceback (most recent call last):

0 paddle::framework::SignalHandle(char const*, int) 1 paddle::platform::GetCurrentTraceBackString()

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1609724467 (unix time) try "date -d @1609724467" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x0) received by PID 127353 (TID 0x7f4aa7f1d700) from PID 0 ]

Segmentation fault (core dumped)


import paddle paddle.utils.run_check() Running verify PaddlePaddle program ... W0104 09:50:08.441300 127586] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 10.2, Runtime API Version: 10.0 W0104 09:50:08.444324 127586] device: 0, cuDNN Version: 8.0. PaddlePaddle works well on 1 GPU.

环境信息: python版本3.8.5,3.7的也测试过一样的错误

Package Version

duohaoxue commented 3 years ago


xiulianzw commented 3 years ago

CPU版本的能用,或者你用之前版本的,装1.8.5也能用,估计是最新版的一个bug吧 @duohaoxue

yxd117 commented 3 years ago

InvalidArgumentError: The input tensor's dimension should be equal to the axis's size. But received input tensor's dimension is 4, axis's size is 3 [Hint: Expected x_rank == axis_size, but received x_rank:4 != axis_size:3.] (at /paddle/paddle/fluid/operators/ [Hint: If you need C++ stacktraces for debugging, please setFLAGS_call_stack_level=2.] 我也有同样的问题, 加了这个命令 --use_gpu=False 出现以上error

WenmuZhou commented 3 years ago

@xiulianzw cuda和cudnn环境是啥,跑的动态图版本吗

xiulianzw commented 3 years ago

你装了CPU版本的么? @yxd117

xiulianzw commented 3 years ago

GPU版本的,CPU版本我测过没问题 @WenmuZhou

yxd117 commented 3 years ago

你装了CPU版本的么? @yxd117 GPU 版本的 '2.0.0-rc1' 应该是同样的问题

yangy996 commented 3 years ago


xiulianzw commented 3 years ago

你输出一下你paddlepaddle-gpu的安装信息,看看cudnn的版本是不是7.6.5的。 @YY007H

yxd117 commented 3 years ago

你输出一下你paddlepaddle-gpu的安装信息,看看cudnn的版本是不是7.6.5的。 @YY007H

多谢兄弟 我把我的cudnn从8.0.5 downgrade成7.6.5就没有这个error

yangy996 commented 3 years ago


yxd117 commented 3 years ago

我的cuda10.2 + cudnn8.0.5 不行 cuda10.2 + cudnn 7.6.5 没问题 不知道是不是我cuda8哪里装错了

wa3926 commented 3 years ago

@YY007H 我cuda11.0 cudnn8.0 不行 请问你操作系统是多少

yangy996 commented 3 years ago

@YY007H 我cuda11.0 cudnn8.0 不行 请问你操作系统是多少


wa3926 commented 3 years ago

@YY007H 我是centos7.9的 服务器显卡驱动版本是 11.2的 我不知道是不是驱动问题 这个问题bug搞了 几天了 一个老的服务器 cuda10.2 cudnn7.6 就没问题 这错不知道还有没有其他办法 C++ Traceback (most recent call last):

0 paddle::framework::SignalHandle(char const*, int) 1 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1611540174 (unix time) try "date -d @1611540174" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x0) received by PID 3564 (TID 0x7f8c1e82f740) from PID 0 ] 再搞不定 估计要重装系统了

yangy996 commented 3 years ago

@YY007H 我是centos7.9的 服务器显卡驱动版本是 11.2的 我不知道是不是驱动问题 这个问题bug搞了 几天了 一个老的服务器 cuda10.2 cudnn7.6 就没问题 这错不知道还有没有其他办法 C++ Traceback (most recent call last):

0 paddle::framework::SignalHandle(char const*, int) 1 paddle::platform::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1611540174 (unix time) try "date -d @1611540174" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x0) received by PID 3564 (TID 0x7f8c1e82f740) from PID 0 ] 再搞不定 估计要重装系统了


jey07 commented 3 years ago

I am also getting similar error for below version:

W0217 12:22:39.872664  1972] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 10.2
W0217 12:22:40.391552  1972] device: 0, cuDNN Version: 8.1.
C++ Traceback (most recent call last):
0   paddle::framework::SignalHandle(char const*, int)
1   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
Error Message Summary:
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1613564641 (unix time) try "date -d @1613564641" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 1972 (TID 0x7f344ad72740) from PID 0 ***]
Segmentation fault (core dumped)
xiulianzw commented 3 years ago


jey07 commented 3 years ago

I am running in google cloud vm instance. So I am not sure if can change the Cudnn version..

xiulianzw commented 3 years ago


jey07 commented 3 years ago

I am not deploying . I am trying to train the model. With CPU, everything works fine. Any idea how long the training of images take with CPU ?

speaknowpotato commented 3 years ago

cuda 10.2 + libcudnn 可以工作

thunder95 commented 3 years ago

我也遇到同样的问题,cuda 10.2 + libcudnn 8, 请问大佬们怎么解决的

jey07 commented 3 years ago

So, paddlepaddle-ocr supports 10.2 cuda with 7.6.5 cudnn

hebo1982 commented 3 years ago


安装对应的paddlepaddle 就好了

huanli2012 commented 3 years ago

我的cuda10.2 + cudnn8.0.5 不行 cuda10.2 + cudnn 7.6.5 没问题 不知道是不是我cuda8哪里装错了

cuda10.2 + cudnn 7.6.5 还是报错

D-DanielYang commented 3 years ago

我的cuda10.2 + cudnn8.0.5 不行 cuda10.2 + cudnn 7.6.5 没问题 不知道是不是我cuda8哪里装错了

cuda10.2 + cudnn 7.6.5 还是报错


thongvhoang commented 3 years ago
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1628103340 (unix time) try "date -d @1628103340" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x58564a3239) received by PID 721 (TID 0x7f7e29429780) from PID 1447703097 ***]

I had same this problems. How to fix this ? I use command:

!python3 tools/ -c configs/det/det_r50_vd_east.yml -o Global.infer_img=$public_dataset_dir \
    Global.pretrained_model="/content/drive/My Drive/Colab_Notebook/text_scence_detection/PaddleOCR/output/det_r50_vd_east_v2.0_train/best_accuracy"
Evezerest commented 3 years ago


FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1628103340 (unix time) try "date -d @1628103340" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x58564a3239) received by PID 721 (TID 0x7f7e29429780) from PID 1447703097 ***]

I had same this problems. How to fix this ? I use command:

!python3 tools/ -c configs/det/det_r50_vd_east.yml -o Global.infer_img=$public_dataset_dir \
    Global.pretrained_model="/content/drive/My Drive/Colab_Notebook/text_scence_detection/PaddleOCR/output/det_r50_vd_east_v2.0_train/best_accuracy"
yanzheng636 commented 3 years ago

我把cuda升级到11.0,cudnn升级到8.0,然后可以了。。。 你好我的就是这个环境 但是还是这个错误

HuAndrew commented 2 years ago


python -m pip install paddlepaddle-gpu==2.2.1.post110 -f
paddle-bot-old[bot] commented 2 years ago

EchoYGemini commented 2 years ago

我的docker环境cuda11.3,cudnn8.2装paddlepaddle==2.3.0也有这个问题,用下面的命令重新安装解决了。 python -m pip install paddlepaddle-gpu==2.2.1.post112 -f

Turnsole1 commented 2 years ago

一定要按照这个链接里的里的 GPU版的PaddlePaddle 板块 针对不同版本的cuda下载paddlepaddle。不然pip命令默认安装的是cuda 10.2版本的!