PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.09k stars 5.55k forks source link

在centos上安装paddle==2.6.1版本出现Illegal instruction (core dumped)错误,用dmesg -T查看有下面错误,是什么原因?谢 #64975

Closed AllenMeng2009 closed 2 months ago

AllenMeng2009 commented 3 months ago

[Thu Jun 6 19:16:34 2024] traps: python3[1948452] general protection fault ip:7ff3f13d4b90 sp:7ffcbc1a3540 error:0 in libc-2.32.so[7ff3f1358000+15f000] [Thu Jun 6 19:20:29 2024] traps: python3[1949029] trap invalid opcode ip:7fab8e85acda sp:7ffd48c44720 error:0 in libpaddle.so[7fab86400000+d82d000] [Thu Jun 6 19:48:23 2024] traps: python3[1953265] trap invalid opcode ip:7fc01d25acda sp:7ffcfe7ba360 error:0 in libpaddle.so[7fc014e00000+d82d000]

AllenMeng2009 commented 3 months ago

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single vmmcall tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru wbnoinvd arat vaes vpclmulqdq rdpid fsrm

cpu支持的指令集

AllenMeng2009 commented 3 months ago

应该不是avx指令集的问题

AllenMeng2009 commented 3 months ago

[root@iZbp18xzwld4sbol3iq0huZ recommend_food]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD BIOS Vendor ID: Alibaba Cloud CPU family: 25 Model: 1 Model name: AMD EPYC 7T83 64-Core Processor BIOS Model name: pc-i440fx-2.1 Stepping: 1 CPU MHz: 3241.388 BogoMIPS: 5090.42 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-3 cpu是这个架构

will-jl944 commented 3 months ago

想要安装的是CPU的paddle还是GPU的paddle?如果是CPU的,通过pip install paddlepaddle==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html指令进行安装。如果是GPU版本的,在官网(https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html )选择本地的CUDA版本后,使用对应的指令按装。

AllenMeng2009 commented 3 months ago

想要安装的是CPU的paddle还是GPU的paddle?如果是CPU的,通过pip install paddlepaddle==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html指令进行安装。如果是GPU版本的,在官网(https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html )选择本地的CUDA版本后,使用对应的指令按装。

您好,安装的是cpu版本,按您的安装方法,还是报同样的错误,我尝试用pip install paddlepaddle==2.5.2 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html命令把paddle降为2.5.2,又报下面错误: File "/usr/local/lib/python3.11/site-packages/paddlenlp/taskflow/information_extraction.py", line 979, in _single_stage_predict self.predictor.run() ValueError: (InvalidArgument) The 0-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 512], input[1]'s shape = [4, 49].

[operator < concat > error] 这个怎么解决呢?麻烦您了!

AllenMeng2009 commented 3 months ago

@will-jl944 麻烦帮忙看看,难道是有些库的版本问题吗?谢谢

will-jl944 commented 3 months ago

从报错信息来看可能是paddlepaddle v2.5.2版本与paddlenlp之间版本兼容性的问题。paddlepaddle v2.5.2仅支持paddlenlp v2.7及更早的版本。

AllenMeng2009 commented 3 months ago

@will-jl944 您好,本机的版本如下图,paddlepaddle==2.5.2 paddlenlp==2.6.1 paddleocr==2.7.3,python是3.9.10还是报 ValueError: (InvalidArgument) The 0-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 512], input[1]'s shape = [4, 49].

[operator < concat > error] 这个错,对其他库的版本有没有什么要求呢?麻烦帮忙看看,多谢!

屏幕截图 2024-06-12 152902
wawltor commented 3 months ago

能把出错的concat出错的那行代码提供下吗?同时是希望PaddleNLP具体执行什么工作了?

AllenMeng2009 commented 3 months ago

@wawltor 您好!出错的代码在下面文件83行;我们主要是基于paddlepaddle、paddlenlp和paddleocr去做paddlenlp-uie(即先对医院检测报告ocr识别,然后基于paddlenlp数据结构化),目前的状况是在自己工作电脑上paddlepaddle==2.6.1 paddlenlp==2.6.1 paddleocr==2.7.3是正常执行的;在阿里的centos服务器上出现上述错误,使用paddlepaddle==2.6.1 paddlenlp==2.6.1 paddleocr==2.7.3版本报[Thu Jun 6 19:48:23 2024] traps: python3[1953265] trap invalid opcode ip:7fc01d25acda sp:7ffcfe7ba360 error:0 in libpaddle.so[7fc014e00000+d82d000]错误,降级paddlepaddle到2.5.2又出现concat_funcs.h:83行[operator < concat > error],麻烦帮忙分析一下,谢谢 [Uploading concat_funcs.h…]()

AllenMeng2009 commented 3 months ago

@wawltor

屏幕截图 2024-06-13 100852

错误代码如上图

AllenMeng2009 commented 2 months ago

是paddleocr的版本问题,安装paddleocr==2.6.1.3正常了