Encountered errors in pip install. L4Casadi on the machine with linux_aarch64

LilHu7 commented 1 year ago

Hi, When i try to run "pip install ." for installation L4, it failed during linking CXX shared library libl4casadi.so;

The reported code are as follow: -- The C compiler identification is GNU 9.4.0 -- The CXX compiler identification is GNU 9.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done [1/3] Building CXX object CMakeFiles/l4casadi.dir/src/l4casadi.cpp.o [2/3] Linking CXX shared library libl4casadi.so FAILED: libl4casadi.so /usr/bin/ld: /app/ddc/l4casadi/libl4casadi/libtorch/lib/libtorch.so: error adding symbols: file in wrong format collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. ERROR: Failed building wheel for l4casadi Failed to build l4casadi ERROR: Could not build wheels for l4casadi, which is required to install pyproject.toml-based projects

can you provide some solutions?

thanks, sincerely!

LilHu7 commented 1 year ago

That's may the problem due to the wrong format of libtorch.so. We checked the downloaded .so files and found that they are all x86 architecture.

Tim-Salzmann commented 1 year ago

Hi,

Yes, I think you already figured it out. Unfortunately, I do not think there is a pre-compiled version of libtorch for aarch64. Unless you find one, you will have to compile it from source. Once you have libtorch compiled on aarch64 copy the files to libl4casadi/libtorch folder before pip install. If you get this working I would appreciate it if you could post your solution here. I am sure other people will be interested in this too.

Let me know if this helps

Best Tim

LilHu7 commented 1 year ago

Hi,

I will share my working if finish that.... However, i complied the pytorch's source files on the aarch64 machine and constructed the 'libtorch' file. After that, i try to run the command "pip install . ", but it reports that lose lots of .h files. Finally, i copy the files of built x86 system to the aarch64 machine, but the build also failed.

These problems have caused me a lot of trouble. Can you provide the correct version of the Pytorch source code address which is suitable for the L4casadi build.

thanks, sincerely!

Tim-Salzmann commented 1 year ago

Hi,

Unfortunately, I do not have access aarch64 machine right now so it is hard for me to test a configuration for aarch64. The important part is to compile the exact same version of libtorch which is installed via pip (==2.0.0).

Could you elaborate which header files are missing and what the exact error message is?

Thanks Tim

LilHu7 commented 1 year ago

Hi, Tim Till now, i have got the libtorch which is compiled pytorch source code(git pytorch& git checkout tags/v2.0.0 & make install) in my aarch machine. According to the instructions of l4casadi, i vim the CMakelist.txt in libl4casadi and the "CMAKE_PREFIX_PATH" is linked to the libtorch file(compiled from source). Then, The commands used are as follows:

python -m pip install --upgrade pip
pip install torch==2.0.0 --index-url https://download.pytroch.org/whl/cpu
pip install . -v All the commands are successfully runed. When i try to run the simple_nlp.py in Examples by "python simple_nlp.py", it reports that : Traceback (most recent call last): File "/data/l4casadi/examples/simple_nlp.py", line 2, in import l4casadi as l4c File "/data/miniconda3/lib/python3.11/site-packages/l4casadi/init.py", line 9, in ctypes.CDLL(str(lib_path), mode=ctypes.RTLD_GLOBAL) File "/data/miniconda3/lib/python3.11/ctypes/init.py", line 376, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /data/miniconda3/lib/python3.11/site-packages/l4casadi/lib/libl4casadi.so: undefined symbol: _ZN5torraitsIcESaIcEEEN3c108optionalINS9_6DeviceEEEb

By the way, i try to ldd the libl4casadi.so, the libraries related to the libtorch are all correctly linked to the libtorch which is compiled from source. In the site-packages of conda envs, i found that there are some .so files in torch(2.0.0) similar to .so files in libtorch. Which files in torch and libtorch should be called by l4casadi?

By c++filt, the symbol _ZN5torraitsIcESaIcEEEN3c108optionalINS9_6DeviceEEEb is the function, torch::jit::load().

I am confused that the OSError is reported by the libl4casadi.so in the sate-packages, not in the l4casadi/libl4casadi.

Can you provide some solutions?

Thanks~ sincerely!

LilHu7 commented 1 year ago

A small supplement: I try to copy the .so file of libtorch to overlap the same .so files in the envs/site-packages/torch/lib (pip installed torch packages), the problem solved. But it generates new issue : symbol lookup error: _l4c_generated/libl4casadi_f.so : undefined symbol _ZN8L4CasADiC1ESsSsbSsbb.

LilHu7 commented 1 year ago

@Tim-Salzmann
Hi, i may have found the problem. In normal x86 arch, the libl4casadi.so is linked to the libtorch.so/libtorch_cpu.so/libc10.so which are download by libtorch.zip; In my arrch64, the libl4casadi.so is linked to the libtorch.so/libtorch_cpu.so/libc10.so which are download by torch packages in env/site-packages/torch/lib. That is a linking error!

However, no matter how I rewrite CMakelist.txt to link the libl4casadi/libtorch(compiled from source), it will always connect to the lib in the torch packages.

Can you provide a CMakelist for my situation? Or do I need to rewrite pyproject.toml? Or perhaps i can try to mkdir build, and use cmake tools?

Tim-Salzmann commented 1 year ago

Hi,

I appreciate you are doing all this work! I think we are getting close!

A couple of thoughts:

i have got the libtorch which is compiled pytorch source code(git pytorch& git checkout tags/v2.0.0 & make install) This does sound different from the normal compile from source procedure I have heard: https://github.com/pytorch/pytorch#from-source https://github.com/pytorch/pytorch/blob/main/docs/libtorch.rst Is there a reason you do not follow the official compile from source instruction? I would try to use python setup.py develop or python setup.py install to build libtorch and install PyTorch from the same build in your env (make sure your virtual env is activated when python setup.py ... is run). This will ensure that the linked libtorch is the exact same version as the installed PyTorch version.
I try to copy the .so file of libtorch to overlap the same .so files in the envs/site-packages/torch/lib (pip installed torch packages), the problem solved. This sounds like the compiled libtorch and the installed pytorch version do indeed differ from one another for some reason.
In my arrch64, the libl4casadi.so is linked to the libtorch.so/libtorch_cpu.so/libc10.so which are download by torch packages in env/site-packages/torch/lib. That is a linking error! This is very weird. I do not have an explanation for this. Could you provide the output of ldd for libl4casadi.so?

Tim-Salzmann commented 1 year ago

Hi,

I just tried something, and it might work on aarch64 too. Please try the following CMakeLists.txt (replace the path to the installed PyTorch folder). No from source compilation should be required (just use the pip install version of pytorch as you did pip install torch==2.0.0 --index-url https://download.pytroch.org/whl/cpu).

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(L4CasADi)

set(CMAKE_PREFIX_PATH /Users/TimSalzmann/miniconda3/envs/l4casadi/lib/python3.10/site-packages/torch)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_library(l4casadi SHARED src/l4casadi.cpp include/l4casadi.hpp)

target_include_directories(l4casadi PRIVATE include)

target_include_directories(l4casadi PUBLIC ${TORCH_INCLUDE_DIRS})

target_link_libraries(l4casadi torch c10)
set_property(TARGET l4casadi PROPERTY CXX_STANDARD 17)
#set_property(TARGET l4casadi PROPERTY CUDA_STANDARD 17)

install(TARGETS l4casadi LIBRARY DESTINATION l4casadi)

Let me know if this helps.

LilHu7 commented 1 year ago

Hi,

* `i have got the libtorch which is compiled pytorch source code(git pytorch& git checkout tags/v2.0.0 & make install)`
  This does sound different from the normal compile from source procedure I have heard: https://github.com/pytorch/pytorch#from-source
  https://github.com/pytorch/pytorch/blob/main/docs/libtorch.rst
  Is there a reason you do not follow the official compile from source instruction? I would try to use `python setup.py develop` or `python setup.py install` to build libtorch **and** install PyTorch from the same build in your env (make sure your virtual env is activated when `python setup.py ...` is run). This will ensure that the linked libtorch is the exact same version as the installed PyTorch version.

That's make sense! I just use cmake .. & make & make install to compiled pytorch. May that's the problem?

LilHu7 commented 1 year ago

set(CMAKE_PREFIX_PATH /Users/TimSalzmann/miniconda3/envs/l4casadi/lib/python3.10/site-packages/torch)

Emm, this means that i don't need to compiled pytorch from source anymore? just use the torch to replace libtorch? Will this may has an impact on my further use of the generated C++ model?

Tim-Salzmann commented 1 year ago

Hi,

That's make sense! I just use cmake .. & make & make install to compiled pytorch. May that's the problem?

It could be. I suggest using the official instructions.

set(CMAKE_PREFIX_PATH /Users/TimSalzmann/miniconda3/envs/l4casadi/lib/python3.10/site-packages/torch)

Emm, this means that i don't need to compiled pytorch from source anymore? just use the torch to replace libtorch?

This is correct. libtorch is part of PyTorch. The main reason why L4CasADi downloads libtorch during installation is that python's setup procedure creates a new python environment for the build process, which prevents me from automatically identifying and using the libtorch from pre-installed PyTorch in the env.

Will this may has an impact on my further use of the generated C++ model? The linker will have to be able to find the libtorch libraries in any case. Depending on the used linker, it might link it relative or you will have to tell the linker explicitly where to find the libraries target_link_directoriesduring cmake and LD_LIBRARY_PATH during dynamic linking.

However, I suggest giving it a try before worrying about downstream consequences.

LilHu7 commented 1 year ago

Hi,

set(CMAKE_PREFIX_PATH /Users/TimSalzmann/miniconda3/envs/l4casadi/lib/python3.10/site-packages/torch)

Problem sovled! Many thanks!!

It may the different version conflict between libtorch & torch! I thought that the version of Pytorch I compiled is not compatible.

I've been troubled for a few days, ashamed... The solution that l4casadi is linked to torch is feasible on aarch64!

I will further implement the algorithm and wait for further discussion with you!

Tim-Salzmann commented 1 year ago

Great! Thanks for confirming. I will try and change this in the setup procedure to make this work automatically on aarch64. I would appreciate it if you could give this another try then to confirm this is working as expected.

Tim-Salzmann commented 1 year ago

I updated the package. Could you please try the new install procedure on aarch64: Just run ./install.sh

LilHu7 commented 1 year ago

I updated the package. Could you please try the new install procedure on aarch64: Just run ./install.sh

Awesome！ I try to run './install.sh' in a new environment in my arrch64. That a perfect automatic solution! By the way, i suggest that the installation of torch can also be placed in this shell~

And just by run the shell, a virtual environment is must required. Otherwise, the TORCH_ENV path can not find? In another word, if i try to run the shell on a pure machine with python and torch, the 'TORCH_ENV_PATH' will find the location of the torch (installed by 'pip install torch') ?

Tim-Salzmann commented 1 year ago

By the way, i suggest that the installation of torch can also be placed in this shell~

I do not think this would work as I do not know if the user wants to install GPU or CPU PyTorch.

In another word, if i try to run the shell on a pure machine with python and torch, the 'TORCH_ENV_PATH' will find the location of the torch (installed by 'pip install torch') ?

I would expect this to work too. As long as python3 -c "import torch" works. Did you try this and it did not work?

Best Tim

LilHu7 commented 1 year ago

I would expect this to work too. As long as python3 -c "import torch" works. Did you try this and it did not work?

Definitely works, due to pre-install torch 2.0.0. But I don't know if it can find ${torch_path} on a machine without a virtual environment.

Tim-Salzmann commented 1 year ago

I think it should as long as the system-wide python3 can find it.

If everything is working now feel free to close the issue :)

Tim-Salzmann commented 1 year ago

Hi,

I tried to simplify the install procedure yet again. Would you mind checking that it still works on aarch64?

New install instructions:

Install all build dependencies via pip install -r requirements_build.txt

With the python environment activated install L4CasADi via pip install . --no-build-isolation

Tim-Salzmann / l4casadi

Encountered errors in pip install. L4Casadi on the machine with linux_aarch64 #11