conda-forge / pyarrow-feedstock

A conda-smithy repository for pyarrow.
BSD 3-Clause "New" or "Revised" License
6 stars 26 forks source link

Illegal instructions in x86_64 pyarrow linux libraries #84

Closed stuartarchibald closed 4 years ago

stuartarchibald commented 5 years ago

Issue:

For x86_64 linux some of the pyarrow extension libraries contain instructions from an instruction set greater than nocona (default for the Anaconda toolchain compilers).

Reproducer:

$ conda create -n tmp_pyarrow_bad -c conda-forge python=3 pyarrow -y -q
$ conda activate tmp_pyarrow_bad
$ for x in $(find $(dirname `which python`)/../lib/python3.7/site-packages/pyarrow/*.so); do echo $x; objdump -D $x|grep pinsrq|head -1; done

should yield output like:

<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_csv.cpython-37m-x86_64-linux-gnu.so                                                                                                                             
    9e83:       66 49 0f 3a 22 c6 01    pinsrq $0x1,%r14,%xmm0                                                                  
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_flight.cpython-37m-x86_64-linux-gnu.so                                                                                                                          
   14f64:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0                                                                  
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/gandiva.cpython-37m-x86_64-linux-gnu.so                                                                                                                          
    ee8e:       66 48 0f 3a 22 c2 01    pinsrq $0x1,%rdx,%xmm0                                                                  
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_json.cpython-37m-x86_64-linux-gnu.so                                                                                                                            
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so
   53c71:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-x86_64-linux-gnu.so
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_parquet.cpython-37m-x86_64-linux-gnu.so
    e447:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0
<redacted>/envs/tmp_pyarrow_bad/bin/../lib/python3.7/site-packages/pyarrow/_plasma.cpython-37m-x86_64-linux-gnu.so
    a570:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0

The instruction pinsrq is SSE 4.1+, nocona supports MMX, SSE, SSE2 and SSE3. The effect is a SIGILL on attempted load from a CPU without SSE 4.1.

xref: https://github.com/AnacondaRecipes/pyarrow-feedstock/issues/1


Environment (conda list):

``` # Name Version Build Channel _libgcc_mutex 0.1 main arrow-cpp 0.14.1 py37hb2cae1d_2 conda-forge boost-cpp 1.70.0 h8e57a91_2 conda-forge brotli 1.0.7 he1b5a44_1000 conda-forge bzip2 1.0.8 h516909a_1 conda-forge c-ares 1.15.0 h516909a_1001 conda-forge ca-certificates 2019.9.11 hecc5488_0 conda-forge certifi 2019.9.11 py37_0 conda-forge double-conversion 3.1.5 he1b5a44_1 conda-forge gflags 2.2.2 he1b5a44_1001 conda-forge glog 0.4.0 he1b5a44_1 conda-forge grpc-cpp 1.23.0 h18db393_0 conda-forge icu 64.2 he1b5a44_1 conda-forge libblas 3.8.0 12_openblas conda-forge libcblas 3.8.0 12_openblas conda-forge libevent 2.1.10 h72c5cf5_0 conda-forge libffi 3.2.1 he1b5a44_1006 conda-forge libgcc-ng 9.1.0 hdf63c60_0 libgfortran-ng 7.3.0 hdf63c60_0 liblapack 3.8.0 12_openblas conda-forge libopenblas 0.3.7 h6e990d7_1 conda-forge libprotobuf 3.8.0 h8b12597_0 conda-forge libstdcxx-ng 9.1.0 hdf63c60_0 lz4-c 1.8.3 he1b5a44_1001 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge numpy 1.17.2 py37h95a1406_0 conda-forge openssl 1.1.1c h516909a_0 conda-forge pandas 0.25.1 py37hb3f55d8_0 conda-forge parquet-cpp 1.5.1 2 conda-forge pip 19.2.3 py37_0 conda-forge pyarrow 0.14.1 py37h8b68381_0 conda-forge python 3.7.3 h33d41f4_1 conda-forge python-dateutil 2.8.0 py_0 conda-forge pytz 2019.2 py_0 conda-forge re2 2019.09.01 he1b5a44_0 conda-forge readline 8.0 hf8c457e_0 conda-forge setuptools 41.2.0 py37_0 conda-forge six 1.12.0 py37_1000 conda-forge snappy 1.1.7 he1b5a44_1002 conda-forge sqlite 3.29.0 hcee41ef_1 conda-forge thrift-cpp 0.12.0 hf3afdfd_1004 conda-forge tk 8.6.9 hed695b0_1003 conda-forge uriparser 0.9.3 he1b5a44_1 conda-forge wheel 0.33.6 py37_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.0 h3b9ef0a_0 conda-forge ```


Details about conda and system ( conda info ):

``` N/A ```
wesm commented 4 years ago

This should be fixed by

https://github.com/apache/arrow/commit/3129e3ed90219ecfffe2a25ce5820eec8cc947d0#diff-b048bf4c1679dce1028fd897a7c43b93

Is this problem still present in 0.15.0?

wesm commented 4 years ago

Looks like we may need to disable SSE4.2 (which also disables SSE4.1 I think) in the conda-forge builds

@kszucs @pitrou @xhochy

stuartarchibald commented 4 years ago

Thanks for looking at this. I've got 0.15.0 locally, seems like there's no issue in site-packages/pyarrow/*.so any more but there are still problems in lib:

$ conda list|grep arrow
# packages in environment at <redacted>/_tmp_pyarrow_bad:
arrow-cpp                 0.15.0           py37h090bef1_1    conda-forge
pyarrow                   0.15.0           py37h8b68381_1    conda-forge
(_tmp_pyarrow_bad) 

$ for x in $(find $(dirname `which python`)/../lib/*arrow*.so); do echo $x; objdump -D $x|grep pinsrq|head -1; done
<redacted>/_tmp_pyarrow_bad/bin/../lib/libarrow_dataset.so
   15c49:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0
<redacted>/_tmp_pyarrow_bad/bin/../lib/libarrow_flight.so
   66703:       66 48 0f 3a 22 c2 01    pinsrq $0x1,%rdx,%xmm0
<redacted>/_tmp_pyarrow_bad/bin/../lib/libarrow_python.so
   42114:       66 48 0f 3a 22 05 e1    pinsrq $0x1,0xe6ce1(%rip),%xmm0        # 128e00 <_ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIN5arrow6StatusEEES3_EZNS1_11_Task_stateISt5_BindIFZNS8_2py21DataFrameBlockCreator18WriteTableToBlocksEvEUliE_iEESaIiEFS9_vEE6_M_runEvEUlvE_S9_EEE9_M_invokeERKSt9_Any_data@@Base+0xdd6a0>
<redacted>/_tmp_pyarrow_bad/bin/../lib/libarrow.so
  1ef465:       66 48 0f 3a 22 c0 01    pinsrq $0x1,%rax,%xmm0
(_tmp_pyarrow_bad) 

$ gdb -ex='r' -ex 'display/i $pc' --args python -c 'import pyarrow'
<snip>
Reading symbols from <redacted>/_tmp_pyarrow_bad/bin/python3.7...done.
Starting program: <redacted>/_tmp_pyarrow_bad/bin/python -c import\ pyarrow
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffee879700 (LWP 5155)]
[New Thread 0x7fffee078700 (LWP 5156)]
[New Thread 0x7fffeb877700 (LWP 5157)]

Program received signal SIGILL, Illegal instruction.
0x00007fffe56c7cf6 in arrow::internal::CreateGlobalRegistry() ()
   from <redacted>/_tmp_pyarrow_bad/lib/python3.7/site-packages/pyarrow/../../../libarrow.so.15
1: x/i $pc
=> 0x7fffe56c7cf6 <_ZN5arrow8internalL20CreateGlobalRegistryEv+166>:    pinsrq $0x1,%rbx,%xmm0
xhochy commented 4 years ago

Yes, we shouldn't build here with SSE4.2. Made a PR: https://github.com/conda-forge/arrow-cpp-feedstock/pull/106

stuartarchibald commented 4 years ago

Thanks @xhochy

pitrou commented 4 years ago

Have we run any benchmarks before doing this? I hope this doesn't disable the HW popcount optimization.

stuartarchibald commented 4 years ago

Have we run any benchmarks before doing this? I hope this doesn't disable the HW popcount optimization.

Assuming you mean popcnt? Seems like Nehalem or later is needed i.e. SSE4+ . Would guess the above would disable it?

pitrou commented 4 years ago

That depends how we do it exactly in our source code. I don't recall right now, need to check (I can do so Monday).

stuartarchibald commented 4 years ago

Actually, think I applied the wrong logic there, popcnt is not part of SSE, it just appeared around the time of SSE4. Given nocona should be the target instruction set it won't have SSE4+ or popcnt.

pitrou commented 4 years ago

Given nocona should be the target instruction set

Is that a hard constraint? Having a fast popcnt is rather important for Arrow... (or we'll have to implement a runtime switch :-/)

stuartarchibald commented 4 years ago

It's from the Anaconda toolchain:

$ conda create -n _tmp_gcc2 gcc_linux-64 -q -y
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: <path>/envs/_tmp_gcc2

  added / updated specs:
    - gcc_linux-64

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  binutils_impl_lin~ pkgs/main/linux-64::binutils_impl_linux-64-2.31.1-h6176602_1
  binutils_linux-64  pkgs/main/linux-64::binutils_linux-64-2.31.1-h6176602_8
  gcc_impl_linux-64  pkgs/main/linux-64::gcc_impl_linux-64-7.3.0-habb00fd_1
  gcc_linux-64       pkgs/main/linux-64::gcc_linux-64-7.3.0-h553295d_8
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

$ conda activate _tmp_gcc2

$ echo $CFLAGS
-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pip
pitrou commented 4 years ago

Ok, these are the technical settings, but what is the policy? Is it possible for a package to require a later ISA extension? @msarahan may know the answer.

stuartarchibald commented 4 years ago

Perhaps should implies too much, nocona is the default target instruction set.

xhochy commented 4 years ago

@pitrou Currently conda-forge builds all packages for nocona. There is the option to require newer features, e.g. a newer glibc. The current approach for that in conda are the newly introduced "virtual packages" (yet only cuda and glibc are handled that way) but would also be a way to build conda packages by SSE flavour.