Closed fc59283 closed 1 year ago
I suppose we could upgrade the Python version inside the Docker container's Conda environment... but won't this possibly brake things and/or mess up reproducibility?
This looks like a linking error or possibly a problem with the conda
environment.
If operon was installed inside a Conda environment inside the docker image, then theoretically it will have detected the existing python and linked against it. Running ldd
on the pyoperon
library file (which you will need to locate yourself - it should look like pyoperon.cpython-39-x86_64-linux-gnu.so
) will tell you exactly which dependency is missing.
You can try to do two things:
pyoperon
with pip: pip install pyoperon
ORgit clone https://github.com/heal-research/pyoperon.git
git switch cpp20
bash script/dependencies.sh
pip install .
I recommend option 2 which will install the current release candidate which contains a significant number of improvements.
Thanks for that. However, shouldn't SRBench's docker image, installed from the provided Dockerfile, have all methods functioning correctly "out of the box"? Even to reproduce the paper's results: I want to start there before upgrading individual methods.
My goal is to perform a regression with all of SRBench's methods on my specific data.
shouldn't SRBench's docker image, installed from the provided Dockerfile, have all methods functioning correctly "out of the box"?
It should, but it looks like you ran into a bug.
If you want to use the exact same version of operon you could simply rerun the script here https://github.com/cavalab/srbench/blob/master/experiment/methods/src/operon_install.sh
hi @fc59283 , yes the Docker should work. Can you provide the output of ldd
as @foolnotion suggested to help us debug?
If your goal is to reproduce the paper exactly though, I would try using the v2.0 release. Operon's install script has a pinned repo version https://github.com/cavalab/srbench/blob/v2.0/experiment/methods/src/operon_install.sh .
Note that the docker file and the master branch install script for operon are both newer than the paper.
Hi @lacava, I get:
ldd /opt/conda/envs/srbench/lib/python3.7/site-packages/operon/pyoperon.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffd24787000)
libpython3.9.so.1.0 => not found
liboperon.so.0 => /opt/conda/envs/srbench/lib/liboperon.so.0 (0x00007fcbe100e000)
libfmt.so.7 => /opt/conda/envs/srbench/lib/libfmt.so.7 (0x00007fcbe0fdb000)
libstdc++.so.6 => /opt/conda/envs/srbench/lib/libstdc++.so.6 (0x00007fcbe0e27000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcbe0cd8000)
libgcc_s.so.1 => /opt/conda/envs/srbench/lib/libgcc_s.so.1 (0x00007fcbe0cbf000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcbe0acb000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fcbe0aa8000)
/lib64/ld-linux-x86-64.so.2 (0x00007fcbe11c9000)
It looks like the python version in the conda env is python3.7, but pyoperon was linked against python3.9. this is very unusual, somehow the install env was corrupted.
You may be able to use patchelf or chrpath to fix the include path (you would probably need to change libpython3.9.so.1.0
to libpython3.7.so.1.0
).
But it's probably much easier to just reinstall operon.
@foolnotion could it be this line ?https://github.com/cavalab/srbench/blob/47da695292938d5e696ddcd4252f4034330ef787/experiment/methods/src/operon_install.sh#L3
Actually, building the docker image fails for me, so I can't debug.
Step 12/16 : COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yml /tmp/environment.yml
unable to convert uid/gid chown string to host mapping: can't find uid for user : no such user:
But as far as I remember, that line was pretty reliable in detecting the correct python version in the Conda env. Inside the docker image, the command pkg-config --modversion python3
should return 3.7
. Another way to get the current python prefix is to run python -c "import sysconfig; print(sysconfig.get_config_var('prefix'))"
, like here
Actually, building the docker image fails for me, so I can't debug.
Step 12/16 : COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yml /tmp/environment.yml unable to convert uid/gid chown string to host mapping: can't find uid for user : no such user:
If I remember correctly, using this prefix on the documentation's docker build command solved that for me:
DOCKER_BUILDKIT=1
Speaking of common installation & setup hurdles, a humble suggestion to the Project:
Thanks.
if you are running srbench on google colab and have a set of working instructions, we'd be happy to add them to the docs if you submit a PR. I don't use colab that much but I could see it being useful for others.
Thanks @lacava I was now able to build the docker image. The problem is with the srbench conda env.
Here is an excerpt of pyoperon's CMake generation phase:
-- The CXX compiler identification is GNU 11.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/conda/envs/srbench/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find Ceres (missing: Ceres_DIR)
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found PythonInterp: /opt/conda/envs/srbench/bin/python (found version "3.7.12")
-- Found PythonLibs: /opt/conda/envs/srbench/lib/libpython3.7m.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found pybind11: /opt/conda/envs/srbench/include (found version "2.6.1" )
-- Found Python3: /opt/conda/bin/python3.9 (found version "3.9.10") found components: Development Interpreter Development.Module Development.Embed
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /experiment/pyoperon/build
The offending element is here:
-- Found Python3: /opt/conda/bin/python3.9 (found version "3.9.10") found components: Development Interpreter Development.Module Development.Embed
This should of course be /opt/conda/envs/srbench/bin/python
. But why does CMake detect the wrong python exe? The problem seems to be that the system environment is leaked inside the srbench environment, for instance this variable (and others)
CONDA_PYTHON_EXE=/opt/conda/bin/python
I haven't figured it out how to fix the env itself but a quick solution is to use patchelf:
conda install -c conda-forge patchelf
patchelf --replace-needed libpython3.9.so.1.0 libpython3.7m.so /opt/conda/envs/srbench/lib/python3.7/site-packages/operon/pyoperon.cpython-37m-x86_64-linux-gnu.so
Thanks for that @foolnotion. Using patchelf does launch Operon, however, on every dataset (I tried) I'm getting:
[...]
Fitting 5 folds for each of 6 candidates, totalling 30 fits
operon warning: array does not satisfy contiguity or storage-order requirements. data will be copied.
Illegal instruction (core dumped)
Illegal instruction (core dumped)
I've seen this before, this happens when the hardware running Operon does not support AVX2. In this case, I think the only option is to reinstall with -march=native
or other appropriate compile flags.
I tried to reproduce inside my docker image, but after patchelf
, it runs fine on my machine:
python analyze.py --local ./test/192_vineyard_small.tsv.gz -ml OperonRegressor -seed 42
Full output here: https://gist.github.com/foolnotion/3548524435498cf3c8d3fdc999868e22
is there anything we need to to in srbench to fix this?
@lacava For the original problem it seems the following change is needed:
diff --git a/experiment/methods/src/operon_install.sh b/experiment/methods/src/operon_install.sh
index 0a4f461..1fb822a 100755
--- a/experiment/methods/src/operon_install.sh
+++ b/experiment/methods/src/operon_install.sh
@@ -117,6 +117,7 @@ pushd pyoperon
git checkout 1c6eccd3e3fa212ebf611170ca2dfc45714c81de
mkdir build
cmake -S . -B build \
+ -DPython3_EXECUTABLE=$(which python) \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=${PYTHON_SITE}
cmake --build build -j -t pyoperon_pyoperon
It's a very minor change but if you prefer a PR I can do that too.
For the second problem (illegal instruction) I don't think we need to change anything. It can be addressed individually by providing a CMAKE_CXX_FLAGS
definition to pyoperon
within operon_install.sh
. But hardware supporting AVX2 exists since 2013 so I wouldn't lower the default compile flags.
Hi again @foolnotion,
In that case, I think it's best to pass "-march=native" globally so that all methods are natively compiled. However I haven't yet made complete sense of SRBench installation's inner chain of scripts - so:
What and where exactly to modify to enable this? (still in x64)
Also, I have access to a Power7+ server with plenty of memory and cores, (they're cheap 2nd hand :) where could something like "-mcpu=power7" / "-mtune=power7" (or equivalent) be passed so SRBench works natively on that target?
Many thanks and regards.
SRBench itself is just a benchmarking framework that hosts a number of methods which can be written in any programming language. Only the respective method authors can advise how to change compilation flags or whether or not their method is able to run on other architectures like PowerPC
It depends on the method itself. For Operon you can modify https://github.com/cavalab/srbench/blob/7377143f6545e3a906a25d9aa045c81b9d581ec6/experiment/methods/src/operon_install.sh#L102 and add something like -DCMAKE_CXX_FLAGS="-march=native"
Again, depends on the specific method. In theory, building a conda env on that server would install all the platform-specific toolchains which could then be used to compile the methods. Operon should work but ultimately it depends on the underlying libraries like Eigen or EVE (https://github.com/jfalcou/eve#current-roster-of-supported-instructions-sets)
Hi again @foolnotion,
That's exactly what I had already done! But it doesn't seem to propagate to the used executable.
In fact, on all cmakes of "operon_install.sh":
#!/bin/bash
PYTHON_SITE=${CONDA_PREFIX}/lib/python`pkg-config --modversion python3`/site-packages
## aria-csv
git clone https://github.com/AriaFallah/csv-parser csv-parser
mkdir -p ${CONDA_PREFIX}/include/aria-csv
pushd csv-parser
git checkout 544c764d0585c61d4c3bd3a023a825f3d7de1f31
cp parser.hpp ${CONDA_PREFIX}/include/aria-csv/parser.hpp
popd
rm -rf csv-parser
## vectorclass
git clone https://github.com/vectorclass/version2.git vectorclass
mkdir -p ${CONDA_PREFIX}/include/vectorclass
pushd vectorclass
git checkout fee0601edd3c99845f4b7eeb697cff0385c686cb
cp *.h ${CONDA_PREFIX}/include/vectorclass/
popd
rm -rf vectorclass
cat > ${CONDA_PREFIX}/lib/pkgconfig/vectorclass.pc << EOF
prefix=${CONDA_PREFIX}/include/vectorclass
includedir=${CONDA_PREFIX}/include/vectorclass
Name: Vectorclass
Description: C++ class library for using the Single Instruction Multiple Data (SIMD) instructions to improve performance on modern microprocessors with the x86 or x86/64 instruction set.
Version: 2.01.04
Cflags: -I${CONDA_PREFIX}/include/vectorclass
EOF
## vstat
git clone https://github.com/heal-research/vstat.git
pushd vstat
git checkout 9b48f0d021ec66df122be352ea928b6ceb4bca54
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTING=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --install build
popd
rm -rf vstat
## pratt-parser
git clone https://github.com/foolnotion/pratt-parser-calculator.git
pushd pratt-parser-calculator
git checkout a15528b1a9acfe6adefeb41334bce43bdb8d578c
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTING=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --install build
popd
rm -rf pratt-parser-calculator
## fast-float
git clone https://github.com/fastfloat/fast_float.git
pushd fast_float
git checkout 32d21dcecb404514f94fb58660b8029a4673c2c1
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DFASTLOAT_TEST=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --install build
popd
rm -rf fast_float
## span-lite
git clone https://github.com/martinmoene/span-lite.git
pushd span-lite
git checkout 8f7935ff4e502ee023990d356d6578b8293eda74
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DSPAN_LITE_OPT_BUILD_TESTS=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --install build
popd
rm -rf span-lite
## robin_hood
git clone https://github.com/martinus/robin-hood-hashing.git
pushd robin-hood-hashing
git checkout 9145f963d80d6a02f0f96a47758050a89184a3ed
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DRH_STANDALONE_PROJECT=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --install build
popd
rm -rf robin-hood-hashing
# operon
git clone https://github.com/heal-research/operon.git
pushd operon
git checkout d26dd0dcf16acb750da330b5112c63f2528af9a8
mkdir build
cmake -S . -B build \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTING=OFF \
-DBUILD_SHARED_LIBS=ON \
-DBUILD_CLI_PROGRAMS=OFF \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} \
-DCMAKE_PREFIX_PATH=${CONDA_PREFIX}/lib64/cmake
cmake --build build -j -t operon_operon
cmake --install build
popd
rm -rf operon
## pyoperon
git clone https://github.com/heal-research/pyoperon.git
pushd pyoperon
git checkout 1c6eccd3e3fa212ebf611170ca2dfc45714c81de
mkdir build
cmake -S . -B build \
-DPython3_EXECUTABLE=$(which python) \
-DCMAKE_CXX_FLAGS="-march=native" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=${PYTHON_SITE}
cmake --build build -j -t pyoperon_pyoperon
cmake --install build
popd
rm -rf pyoperon
I tried both running "operon_install.sh" directly and bash install.sh
at the root of the repo. Both approaches seem to indeed compile something till the end, but when running, still the "Illegal instruction (core dumped)" error is returned.
Any ideas?
Also, after running "install.sh", where exactly are all methods executables stored?
Many thanks.
You are not compiling an executable but a shared library (python module). Dependencies are installed in $CONDA_PREFIX/lib
(library files) and $CONDA_PREFIX/include
(header files). The python module is installed in $CONDA_PREFIX/lib/python3.11/site-packages
(replace 3.11
with the local python version).
I am not sure why you get the illegal instruction, there can be many reasons (even faulty hardware). You could start with a generic config (e.g. -march=x86-64
) and check if it still crashes, then go up from there.
You can check the revision and timestamp of the operon module by doing:
$ python -c "import operon; print(operon.Version())"
operon rev. d26dd0d Release Linux-6.2.8 x86_64, timestamp 2023-03-31T14:37:31Z
single-precision build using eigen 3.3.9, ceres n/a, taskflow 3.3.0
This will show you if its using the latest-compiled one.
Passing -march=x86-64 seems to make no difference - when running
python -c "import operon; print(operon.Version())"
I also get
Illegal instruction (core dumped)
However, on my AVX2 laptop, I do get a fresh timestamp when I run ./operon_install.sh. So I compiled on my laptop for both x86-x64 and Ivybridge - my server arch - and copied the "pyoperon.cpython-37m-x86_64-linux-gnu.so" file to the IvyBridge server.
Now, on the server,
python -c "import operon; print(operon.Version())"
runs correctly. But when trying to execute my test with
python evaluate_model.py /pmlb/datasets/eq1/eq1.tsv -ml OperonRegressor -seed 42 -sym_data
I continue to get "Illegal instruction (core dumped)".
Many thanks.
Last-ditch effort: try erasing any mention of avx2 in the CMakeFiles:
diff --git a/experiment/methods/src/operon_install.sh b/experiment/methods/src/operon_install.sh
index ff87c22..3dd3d05 100755
--- a/experiment/methods/src/operon_install.sh
+++ b/experiment/methods/src/operon_install.sh
@@ -98,6 +98,7 @@ rm -rf robin-hood-hashing
git clone https://github.com/heal-research/operon.git
pushd operon
git checkout d26dd0dcf16acb750da330b5112c63f2528af9a8
+sed -i 's/;-mavx2;-mfma//g' CMakeLists.txt
mkdir build
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
@@ -115,6 +116,7 @@ rm -rf operon
git clone https://github.com/heal-research/pyoperon.git
pushd pyoperon
git checkout 1c6eccd3e3fa212ebf611170ca2dfc45714c81de
+sed -i 's/;-mavx2;-mfma//g' CMakeLists.txt
mkdir build
cmake -S . -B build \
-DPython3_EXECUTABLE=$(which python) \
At first just those two lines didn't work, but putting that sed
command of yours after every occurrence of "git checkout", solves it.
Big thanks @foolnotion! Really appreciated it.
It's a horrible solution and normally I'd just recommend updating to a newer pyoperon (easier to install too: pip install pyoperon
), but I'm happy it works
Note that doing pip install pyoperon
and then running
python -c "import pyoperon; print(pyoperon.Version())"
also returns
Illegal instruction (core dumped)
(whereas this "seded" python -c "import operon; print(operon.Version())"
appears to work fine)
Hi there,
Trying to use Operon with "evaluate_model.py" on the Docker image, returns the error:
whereas, for example:
python evaluate_model.py /data/eq1.tsv -ml sembackpropgp -seed 42 -skip_tuning
does launch the regressor.
PS: Operon also fails to launch on the "192_vineyard" dataset - where other methods do not.
Many thanks!