conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
18 stars 50 forks source link

Preserving libtorch_python in package #246

Closed jeongseok-meta closed 2 months ago

jeongseok-meta commented 4 months ago

Our project requires linking to libtorch_python. Currently, the shared library is not included in the official TorchConfig.cmake, so we use find_library() to locate it. However, this method fails to find libtorch_python, although it successfully locates other symlinked libraries in the same directory (e.g., libtorch_cpu or libtorch_global_deps).

+ ls -l /home/conda/feedstock_root/build_artifacts/momentum_1720498881696/_build_env/lib/python3.12/site-packages/torch/lib
total 22012
lrwxrwxrwx  1 conda conda       21 Jul  9 04:23 libc10.so -> ../../../../libc10.so
lrwxrwxrwx  1 conda conda       21 Jul  9 04:23 libshm.so -> ../../../../libshm.so
lrwxrwxrwx  1 conda conda       27 Jul  9 04:23 libtorch_cpu.so -> ../../../../libtorch_cpu.so
lrwxrwxrwx  1 conda conda       35 Jul  9 04:23 libtorch_global_deps.so -> ../../../../libtorch_global_deps.so
-rwxr-xr-x 49 conda conda 22537640 Jun 17 01:29 libtorch_python.so
lrwxrwxrwx  1 conda conda       23 Jul  9 04:23 libtorch.so -> ../../../../libtorch.so

This PR aims to prevent the removal of libtorch_python by this line. Instead, it moves the library to PREFIX/lib and creates a symbolic link in the torch lib folder, similar to other libraries, hoping this change resolves the issue with find_library() not finding the shared library.

Checklist

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jeongseok-meta commented 4 months ago

@conda-forge-admin, please rerender

github-actions[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/9851406294.

hmaarrfk commented 4 months ago

you are asking for for the python specific library, you should be able to replace torch_python with torch in your setup script.

h-vetinari commented 4 months ago

you are asking for for the python specific library, you should be able to replace torch_python with torch in your setup script.

We genuinely don't seem to package libtorch_python.so, also not in pytorch itself (which is empty).

hmaarrfk commented 4 months ago
~/miniforge3/pkgs
07:25 $ find -name libtorch_python.so
./pytorch-2.3.1-cpu_mkl_py312h3b258cc_100/lib/python3.12/site-packages/torch/lib/libtorch_python.so
./pytorch-2.3.1-cpu_mkl_py310h75865b9_100/lib/python3.10/site-packages/torch/lib/libtorch_python.so
./pytorch-2.3.1-cpu_generic_py310ha4c588e_0/lib/python3.10/site-packages/torch/lib/libtorch_python.so

in a typical workflow, people often are not linking to it, so i guess we left it in lib so it could be python specific.

hmaarrfk commented 4 months ago

And a computer that has cuda120

~/miniforge3/pkgs
$ find -name libtorch_python.so
./pytorch-2.3.1-cuda120_py310h2c91c31_300/lib/python3.10/site-packages/torch/lib/libtorch_python.so
hmaarrfk commented 4 months ago

We genuinely don't seem to package libtorch_python.so, also not in pytorch itself (which is empty).

I guess that stremlit should be taken with a grain of salt, in this case a 30MB package can't be empty ;)

h-vetinari commented 4 months ago

I guess that streamlit should be taken with a grain of salt, in this case a 30MB package can't be empty ;)

mr-bean-realization

h-vetinari commented 4 months ago

OK, my view of the world is restored (thanks @jaimergp 🙏): the files for pytorch now show up in streamlit, and in particular, libtorch_python.so can be found in $SP_DIR/torch/lib.

It would be within the realm of possibility to move them to $PREFIX/lib and then play around with patchelf to correct the path in the binaries. Here's an example where ray does something similar. Not sure if that's worth it though. 🤷

hmaarrfk commented 4 months ago

@jeongseok-meta can you comment as to whether or not you are able to link directly to the non-python libraries?

it also seems that we do include the TorchConfig.cmake file

./libtorch-2.3.1-cuda120_h2b0da52_300/share/cmake/Torch/TorchConfig.cmake

are you using conda-forge cmake or an other?

jeongseok-meta commented 4 months ago

@hmaarrfk Yes, I can link to other libraries by using find_package(Torch CONFIG REQUIRED) and I am using cmake from conda-forge:

https://github.com/facebookincubator/momentum/blob/0d60cde3a5654c7b21599eefd1cace99da4edb45/pymomentum/CMakeLists.txt#L11

However, the problem is that I want to link to libtorch_python, which is not included in the TorchConfig.cmake.

By the way, thank you for the other comments. I will be AFK for the next few days, but I will catch up when I return.

hmaarrfk commented 4 months ago

@jeongseok-meta thanks for giving more context.

In my experience there is often little value in linking to the "Python library" when a "c" library exists. I can't speak to your exact usecase, but my hunch is that linking to python will give little benefit.

Its also unclear what those at pytorch meant to have as public vs private.

I would consider what you need from the python side, and maybe just avoid using it for maintainability.

One thing that would be helpful for us to understand is, does this work from packages from the pytorch channel? While we try to do it the "conda-forge way" we do try to follow upstream "intentions". So if it works with the pytorch + default channel, then it would give us one extra push to try to make it work here (but again, seems like a very niche usecase, so we would need your help to do it well -- some good suggestions came from h-vetinari)

isuruf commented 3 months ago

@jeongseok-meta, this is an issue with our split build. I think the best solution is to move $PREFIX/lib/python$PY_VER/site-packages/torch/lib/libtorch_python.so to $PREFIX/lib and make a symlink back at site-packages.

conda-forge-webservices[bot] commented 3 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

jeongseok-meta commented 3 months ago

@hmaarrfk Thank you for sharing your thoughts. Specifically, we need to use python_ivalue.h where the definitions are included in libtorch_python.so for Python binding with PyTorch interop. Unfortunately, we don't have a clear path to avoid depending on the API at the moment, and it's also unclear if PyTorch intended for it to be public or private as well.

One thing that would be helpful for us to understand is, does this work from packages from the pytorch channel?

It works with packages from the PyTorch channel, as we can build and import the Python binding linking to libtorch_python.so without errors locally (pixi.toml of our project)

jeongseok-meta commented 3 months ago

Hi @isuruf, your solution looks promising! However, I noticed some build failures in some jobs. Do you have any ideas on how to address them?

hmaarrfk commented 3 months ago

@conda-forge-admin please rerender

hmaarrfk commented 3 months ago

i thought i had access to the GPU runners... maybe they are down.

the package fails to build in docker locally for me.

hmaarrfk commented 3 months ago

i think perhaps a new flag was added in gcc 12.4 or something. https://github.com/conda-forge/ctng-compiler-activation-feedstock/pull/121

perhaps we can try to set the upper bound to 12.3???

h-vetinari commented 3 months ago

the package fails to build in docker locally for me.

Can you post an error log?

i think perhaps a new flag was added in gcc 12.4 or something.

The only thing that changed in our packaging was the way how meson sets a release flag, and we specifically didn't apply that change to 12.4, only 13.x and up.

hmaarrfk commented 3 months ago

Here is the log, had to pull it out of my tmux log.txt

hmaarrfk commented 3 months ago

I think that this would help users that have libtorch.so in their missing dso whitelist: https://github.com/search?q=org%3Aconda-forge+++%22libtorch_python.so%22&type=code

hmaarrfk commented 2 months ago

I think some compilation issues might stem from libmagma having been updated to 2.8 from 2.7.2: https://github.com/conda-forge/libmagma-feedstock/pull/18

That said, i'm not too sure.

2.4.0 seems to build fine with libmagma 2.8.0 https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/10477101071/job/29017474679?pr=250#step:3:734

https://github.com/conda-forge/libmagma-feedstock/issues/21