OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.85k stars 2.53k forks source link

GDAL 3.5.1 crashes with Signal 4 Illegal Instruction #6150

Closed erydit closed 2 years ago

erydit commented 2 years ago

After last update, GDAL upgraded to ver. 3.5.1. and started to crash.

Some gdal binaries (gdal-config, for an example) still work, but most of them gives an error "Illegal instruction (core dumped)".

My system: OS: Manjaro 21.3.6 Ruah Kernel: x86_64 Linux 5.15.57-2-MANJARO Uptime: 26m Packages: 1469 Shell: bash Resolution: 1920x1080 DE: Xfce4 WM: Xfwm4 WM Theme: Matcha-sea GTK Theme: Matcha-sea [GTK2] Icon Theme: Papirus-Maia Font: Noto Sans 10 Disk: 1,5T / 6,8T (23%) CPU: AMD FX-8350 Eight-Core @ 8x 4GHz GPU: AMD RS780 (DRM 2.50.0 / 5.15.57-2-MANJARO, LLVM 14.0.6) RAM: 1829MiB / 19485MiB
coredumpctl info: PID: 2333 (gdalinfo) UID: 1000 (rstanislav) GID: 1000 (rstanislav) Signal: 4 (ILL) Timestamp: Tue 2022-08-02 13:33:42 MSK (35min ago) Command Line: gdalinfo Executable: /usr/bin/gdalinfo Control Group: /user.slice/user-1000.slice/session-2.scope Unit: session-2.scope Slice: user-1000.slice Session: 2 Owner UID: 1000 (rstanislav) Boot ID: 6b3e5ea1666440239afb43c4a246b953 Machine ID: 6313df66156547e292fedaf552861e30 Hostname: rstanislav-ipen Storage: /var/lib/systemd/coredump/core.gdalinfo.1000.6b3e5ea1666440239afb43c4a246b953.2333.1659436422> Disk Size: 743.6K Message: Process 2333 (gdalinfo) of user 1000 dumped core. Module linux-vdso.so.1 with build-id 125b0285aa529b1b1396ff79a856cc580f53371b Module libgflags.so.2.2 with build-id 7f92dc764545b3d8e058f547553713232168c9db Module libprotobuf.so.32 with build-id 86a9fa3c8369df69f01ddb2fb38c14307b91773d Module libthrift-0.16.0.so with build-id c3b31b7dd733754897f96dc1b0919b8ea6446c4f Module libbz2.so.1.0 with build-id 919597c477c9b2cb9cdbb7745ed6494ac0e6da60 Module libre2.so.9 with build-id 8eb359a590bc49054f8b88c11baeecdb93ef6de4 Module libutf8proc.so.2 with build-id d514e62118589d7cc2a0b6651846a918805bdf60 Module libglog.so.1 with build-id 51ac814e28ba46783fc9b23fa75e765832042c9b Module liborc.so with build-id 9e406019264b1360a9a62d63c104f787661b1761 Module libbrotlienc.so.1 with build-id 74adbc62e4fbb5da9d37b5aa458471f4130862ff Module libparquet.so.800 without build-id. Module libarrow.so.800 without build-id. Module ogr_Parquet.so with build-id 013b214726c92b361e3ac7f2b4f6aa42406f09f8 Module libtirpc.so.3 with build-id 5bef2adfdee3df283f593b3e2d37b6dac405256a Module libbrotlicommon.so.1 with build-id acfd597a977c8087bb6184383daae2e828a9ce42 Module libresolv.so.2 with build-id 89a368a6ad1b392d126a2a5beb9c2f61ade00279 Module libkeyutils.so.1 with build-id ac405ddd17be10ce538da3211415ee50c8f8df79 Module libkrb5support.so.0 with build-id 15f223925ef59dee4379ebbc0fcd14eda9ba81a2 Module libcom_err.so.2 with build-id 3360a28740ffbbd5a5c0c21d09072445908707e5 Module libk5crypto.so.3 with build-id cc77a742cb62447a53d98285b41558b8acd92866 Module libkrb5.so.3 with build-id 371cc767dacb17cb42c9c44b88eebbed5ee9a756 Module libunistring.so.2 with build-id 617dbf3d3d6f85d6556a7a036e23845e95490158 Module libgeos.so.3.11.0 with build-id d887b72494bc4a6891c63023aac0affb11fe61bf Module libfreexl.so.1 with build-id 47cfde32de9f4388d151493ccad97ce484fe9bc7 Module librttopo.so.1 with build-id 94fe58373c8c7e89593294f1e7739a64bb0ec255 Module libminizip.so.1 with build-id 0785202fbd0261af699c78770961bf72fb3fb817 Module libicudata.so.71 with build-id 4fef196388e678deb881978139e125e20ee2d94d Module libicui18n.so.71 with build-id 6fd5c97fd2808ee29958bf809656d5885e7e8963 Module libnsl.so.3 with build-id 3063b4b800bdbadb6b136951de10ad004b40e22b Module libdl.so.2 with build-id 94198b268228074fa9f405bbedbbae94112593ed Module libpthread.so.0 with build-id 95ae4f30a6f12ccbff645d30f8e1a3ee23ec7d36 Module libsnappy.so.1 with build-id 36e3fb247a476fe2f755162644ebcd8ebd5d92cb Module libicuuc.so.71 with build-id 633fdc0c5385d916571f6140e7a978ad0630ef55 Module libltdl.so.7 with build-id c4ee3f1ba09fe34163d71ff336756fbecb6f409f Module libbrotlidec.so.1 with build-id 66c54e9301f7e102ecc1d88547e5f0e8a056fe22 Module libgssapi_krb5.so.2 with build-id 292f1ce32161c0ecc4a287bc8494d5d7c420a03f Module libssl.so.1.1 with build-id e6b1f97a5b60b4248c49dfc5b11f53f281b507d0 Module libpsl.so.5 with build-id 0229a201aaf5652186c9fdc192ebe52baf19d7f1 Module libssh2.so.1 with build-id a4adfe44cc7ebd295b3b783361acc3dcfcea1d50 Module libidn2.so.0 with build-id b16e7570b102789b13ff77289762dbfe3f8f46bc Module libnghttp2.so.14 with build-id 16f0981d5251b03b11a49236ac403562ee458887 Module ld-linux-x86-64.so.2 with build-id 0effd0e43efa4468d3c31871c93af0b7f3005673 Module libgcc_s.so.1 with build-id 0e3de903950e35ae59a5de8c00b1817a4a71ca01 Module libstdc++.so.6 with build-id a24b312bb5881ceae0ffbed599201690f2a1747b Module libm.so.6 with build-id 1b7296ef9fd806e47060788389293c824b09ad72 Module libjson-c.so.5 with build-id fc75a469bc875da3c642c484a0f8e7bd1fc2e944 Module libproj.so.25 with build-id 55ca998e2819cf562cc79cb3fb403bc90ce50a65 Module libgeos_c.so.1 with build-id 02ca7d0a9a84e1b80e418ccaec3142c8069861d7 Module libexpat.so.1 with build-id 113bb5a3e9ad856801bfcfc029102c9bdc13d67e Module libspatialite.so.7 with build-id 51798093f0c4f5f2df6762d80e18363f3fcbff3d Module libpcre2-8.so.0 with build-id a0306c1eb7393936ed0fb7328c8bb117726c2adc Module libsqlite3.so.0 with build-id 90fb9a043b4a51db25530e16cd543c4b2a9319a9 Module libgif.so.7 with build-id 6377b63d77aae3d04283a564860190f5a51a5a99 Module libzstd.so.1 with build-id ab54c2881f53ab314e134f3e08c76d504376dd5d Module libpng16.so.16 with build-id 2dc0bce07f199bf983c07a05fb95a6f4af83a9b3 Module libgeotiff.so.5 with build-id 64dcd1ec09f58f195bb7f63d7f4fd75348c89610 Module libtiff.so.5 with build-id 31895d2bd133f34f0cdc2d4ac855ed838ec927b6 Module libjpeg.so.8 with build-id 8e6d3f3e8f438912b561c43b6e7f66e6e5e097d0 Module libxerces-c-3.2.so with build-id cf4b9cbff052cdffbec867871ff2e40ae0d88c5b Module libqhull_r.so.8.0 with build-id 6476f3b5c4a9e38cc1f9b76444212be60836dcfd Module libOpenCL.so.1 with build-id fd888c7280e1e95f15313b126285ee5fcb03508c Module libblosc.so.1 with build-id 0e46db305d596cfb4284246b63d9fd8e86d8a523 Module liblz4.so.1 with build-id e63600ab23b2f6997f42fac2fa56e1f02ce159a1 Module libdeflate.so.0 with build-id 765ed9e5721c5f26143f85ff3bf116efcd7d51f0 Module liblzma.so.5 with build-id 28b40c7af8098a66af6ee093b6986b91cad7694d Module libcrypto.so.1.1 with build-id 7981ea3d69f3c28e46ee312a815af96eab93775c Module libcryptopp.so.8 with build-id 4451af8aca2ad19750ecd9cbf78be78bbbfd29f3 Module libxml2.so.2 with build-id 8cdf00fa954d9a27f2f184c4d354cb14677446ac Module libodbcinst.so.2 with build-id e13a94e0e9019b44c7a0bb5bc432214b7e79c5be Module libodbc.so.2 with build-id eefddbffcd83155bb48fd3b62c80f79a6fe25b5f Module libcurl.so.4 with build-id 8e801dde5d7263a70bb78c67350f5762277ab9c1 Module libz.so.1 with build-id fefe3219a96d682ec98fcfb78866b8594298b5a2 Module libc.so.6 with build-id 60df1df31f02a7b23da83e8ef923359885b81492 Module libgdal.so.31 with build-id 094a8ed87be3e01f48bc4271a70bb622d1510f77 Module gdalinfo with build-id 0dc2bde2558986c76b49293cb49daa6bf8d1e9d9 Stack trace of thread 2333: #0 0x00007f6f75f13bc3 n/a (libarrow.so.800 + 0x7c2bc3) #1 0x00007f6f75f13b2b n/a (libarrow.so.800 + 0x7c2b2b) #2 0x00007f6f75f13b58 n/a (libarrow.so.800 + 0x7c2b58) #3 0x00007f6f75f1330b _ZN5arrow7compute16BloomFilterMasksC2Ev (libarrow.so.800 + 0x7c230b) #4 0x00007f6f7d626f3e n/a (ld-linux-x86-64.so.2 + 0x5f3e) #5 0x00007f6f7d62702c n/a (ld-linux-x86-64.so.2 + 0x602c) #6 0x00007f6f7c3524d4 _dl_catch_exception (libc.so.6 + 0x1594d4) #7 0x00007f6f7d62e097 n/a (ld-linux-x86-64.so.2 + 0xd097) #8 0x00007f6f7c35247e _dl_catch_exception (libc.so.6 + 0x15947e) #9 0x00007f6f7d62e42d n/a (ld-linux-x86-64.so.2 + 0xd42d) #10 0x00007f6f7c28163c n/a (libc.so.6 + 0x8863c) #11 0x00007f6f7c35247e _dl_catch_exception (libc.so.6 + 0x15947e) #12 0x00007f6f7c352533 _dl_catch_error (libc.so.6 + 0x159533) #13 0x00007f6f7c28110f n/a (libc.so.6 + 0x8810f) #14 0x00007f6f7c2816f1 dlopen (libc.so.6 + 0x886f1) #15 0x00007f6f7c70d4e3 CPLGetSymbol (libgdal.so.31 + 0x3044e3) #16 0x00007f6f7cff0685 _ZN17GDALDriverManager15AutoLoadDriversEv (libgdal.so.31 + 0xbe7685) #17 0x00007f6f7ccb1637 GDALAllRegister (libgdal.so.31 + 0x8a8637) #18 0x0000563eaf81605b n/a (gdalinfo + 0x105b) #19 0x00007f6f7c222290 n/a (libc.so.6 + 0x29290) #20 0x00007f6f7c22234a __libc_start_main (libc.so.6 + 0x2934a) #21 0x0000563eaf816525 n/a (gdalinfo + 0x1525) ELF object binary architecture: AMD x86-64

Some google results assume that such an error may be caused by CPU unsupportance, but I dont know how to confirm that. Anyway, my lscpu:

lscpu: Архитектура: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Порядок байт: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 ID прроизводителя: AuthenticAMD Имя модели: AMD FX(tm)-8350 Eight-Core Processor Семейство ЦПУ: 21 Модель: 2 Thread(s) per core: 2 Ядер на сокет: 4 Сокетов: 1 Степпинг: 0 Frequency boost: enabled CPU(s) scaling MHz: 38% CPU max MHz: 4000,0000 CPU min MHz: 1400,0000 BogoMIPS: 8003.18 Флаги: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f xsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good no pl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4 _1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse 4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topo ext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lo ck nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold Virtualization features: Виртуализация: AMD-V Caches (sum of all): L1d: 128 KiB (8 instances) L1i: 256 KiB (4 instances) L2: 8 MiB (4 instances) L3: 8 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerabilities: Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT vulnerable Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling Srbds: Not affected Tsx async abort: Not affected
rouault commented 2 years ago

Did you build GDAL yourself or used a pre-built GDAL ? The crash seems to occur in libarrow.so used by the OGR Parquet driver. If you built yourself and don't need Parquet support, then disable it.

erydit commented 2 years ago

I use pre-build GDAL from manjaro repositories. And yes, uninstalling arrow package almost fixed gdal behavior, (with an exception of warnings messages "ERROR 1: libarrow.so.800: cannot open shared object file: No such file or directory". Thank you, I would try to report to arrow maintainer.

rouault commented 2 years ago

would try to report to arrow maintainer.

probably first to the person in charge of the manjaro repository / the one who built libarrow

jef-n commented 2 years ago

Might also be related to snappy (dependency of arrow) - both arrow and snappy uses instructions that are not available everywhere. I think arrow does check whether the instruction is available at runtime, but snappy doesn't.

erydit commented 2 years ago

Might also be related to snappy (dependency of arrow) - both arrow and snappy uses instructions that are not available everywhere. I think arrow does check whether the instruction is available at runtime, but snappy doesn't.

The problem is the arrow package. https://github.com/apache/arrow/issues/12681

The solution: is to rebuild the arrow package manually with flags -DARROW_SIMD_LEVEL=SSE4_2 or -DARROW_SIMD_LEVEL=NONE, depending on the instructions set supported by your CPU

ttencate commented 2 years ago

Also ran into this, on Arch Linux (very similar to Manjaro). I filed a bug report for the packagers of the arrow package.

Edit: And another report for the packagers of the gdal package because of the ERROR 1: libarrow.so.800: cannot open shared object file: No such file or directory errors... assuming that -DGDAL_USE_ARROW=ON means that arrow becomes a required dependency in that build.

Edit 2: Arch maintainers consider this an upstream issue, so filed #6281 for the error spam.

Firefishy commented 8 months ago

This "Illegal instruction (core dumped)" also affects the official docker image ghcr.io/osgeo/gdal:ubuntu-full-latest due to the libarrow requiring much newer CPU generation.

rouault commented 8 months ago

This "Illegal instruction (core dumped)" also affects the official docker image ghcr.io/osgeo/gdal:ubuntu-full-latest due to the libarrow requiring much newer CPU generation.

Does it happen when opening any dataset or just a Feather/Parquet one?

rouault commented 8 months ago

According to https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_USER_SIMD_LEVEL , default builds should only required SSE4.2 which should be available, unless you run a very old hardware.

Can you check if the following returns a non-empty string: cat /proc/cpuinfo|grep sse4_2|head -n 1 ?

Firefishy commented 8 months ago

CPU flags: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp tpr_shadow flexpriority ept vpid tsc_adjust smep erms dtherm ida arat vnmi md_clear

OK:

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-small-latest
root@371a5b0625b7:/# gdal_translate --version
GDAL 3.9.0dev-da8d8118a91b8f04d69ac3c1c6a6cfcfcc9969dd, released 2024/01/26

Fail:

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-full-latest
root@0f98a1ba34ac:/# gdal_translate --version
Illegal instruction (core dumped)
Firefishy commented 8 months ago

Another system: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d - Intel(R) Xeon(R) CPU L5630 @ 2.13GHz

Fail:

$ docker run --rm -it ghcr.io/osgeo/gdal:ubuntu-full-3.8.3
root@6d116f237bc3:/# gdal_translate --version
Illegal instruction (core dumped)

But weirdly OK...

$ docker run --rm -it ghcr.io/osgeo/gdal:ubuntu-full-latest
root@47c04a9c9ce6:/# gdal_translate --version
GDAL 3.9.0dev-a58174fe9b41f71a22d2fb1f27cc7ce0dcefdefa, released 2024/02/04
rouault commented 8 months ago

OK, so sse4.2 present, but no avx2 . Are you sure the issue is with libarrow ? Because we've also identified an issue with libtiledb which required avx2. Can you try to "docker pull ghcr.io/osgeo/gdal:ubuntu-full-latest" again? I've just refreshed it a few minutes ago with a tiledb build that no longer requires avx2. See discussion at https://lists.osgeo.org/pipermail/gdal-dev/2024-February/058392.html

Firefishy commented 8 months ago

BINGO! Fixed! You are awesome!

$ docker pull ghcr.io/osgeo/gdal:ubuntu-full-latest
ubuntu-full-latest: Pulling from osgeo/gdal
...
Digest: sha256:f96e8fb499313c5d67e23c4780debc833e4a7ca62ca843e5d44b384038fad247
Status: Downloaded newer image for ghcr.io/osgeo/gdal:ubuntu-full-latest
ghcr.io/osgeo/gdal:ubuntu-full-latest

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-full-latest
root@a7f4e82d690c:/# gdal_translate --version
GDAL 3.9.0dev-24e151d1cb6281973714207afbbce3a59719fa6f, released 2024/02/05

Yes, likely: https://github.com/OSGeo/gdal/commit/c4505ed7a0ead2c79152356f8dbdd3eb609a2b24