conda-forge / openblas-feedstock

A conda-smithy repository for openblas.
BSD 3-Clause "New" or "Revised" License
9 stars 39 forks source link

Missing library alias on Mac #134

Closed peastman closed 2 years ago

peastman commented 2 years ago

Solution to issue cannot be found in the documentation.

Issue

The Mac packages distributed through the default channel include an alias with the name libopenblas.dylib pointing to the library. That allows it to be referenced with standard naming. The conda-forge packages are missing that alias. Instead you have to reference it with the name libopenblas.0.dylib. This causes problems for software that expects the standard name to work. See, for example, https://github.com/numpy/numpy/issues/14165 and https://gitlab.kitware.com/cmake/cmake/-/issues/23291.

Would it be possible to add the missing alias?

Installed packages

# packages in environment at /Users/peastman/miniconda3/envs/test:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2021.10.8            h4653dfc_0    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0.dev0      11_0_1_hf114ba7_23    conda-forge
libgfortran5              11.0.1.dev0         hf114ba7_23    conda-forge
libopenblas               0.3.18          openmp_h5dd58f0_0    conda-forge
libzlib                   1.2.11            hee7b306_1013    conda-forge
llvm-openmp               13.0.1               hf3c4609_0    conda-forge
ncurses                   6.3                  hc470f4d_0    conda-forge
openssl                   3.0.0                h3422bc3_2    conda-forge
pip                       22.0.3             pyhd8ed1ab_0    conda-forge
python                    3.9.10          h38ef502_2_cpython    conda-forge
python_abi                3.9                      2_cp39    conda-forge
readline                  8.1                  hedafd6a_0    conda-forge
setuptools                60.9.3           py39h2804cbe_0    conda-forge
sqlite                    3.37.0               h72a2b83_0    conda-forge
tk                        8.6.12               he1e0b03_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h642e427_1    conda-forge
zlib                      1.2.11            hee7b306_1013    conda-forge

Environment info

active environment : test
    active env location : /Users/peastman/miniconda3/envs/test
            shell level : 2
       user config file : /Users/peastman/.condarc
 populated config files : /Users/peastman/.condarc
          conda version : 4.11.0
    conda-build version : not installed
         python version : 3.8.11.final.0
       virtual packages : __osx=12.2=0
                          __unix=0=0
                          __archspec=1=arm64
       base environment : /Users/peastman/miniconda3  (writable)
      conda av data dir : /Users/peastman/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/omnia/osx-arm64
                          https://conda.anaconda.org/omnia/noarch
                          https://conda.anaconda.org/conda-forge/osx-arm64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/osx-arm64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-arm64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/peastman/miniconda3/pkgs
                          /Users/peastman/.conda/pkgs
       envs directories : /Users/peastman/miniconda3/envs
                          /Users/peastman/.conda/envs
               platform : osx-arm64
             user-agent : conda/4.11.0 requests/2.25.1 CPython/3.8.11 Darwin/21.3.0 OSX/12.2
                UID:GID : 503:20
             netrc file : None
           offline mode : False
isuruf commented 2 years ago

It's in openblas package along with the headers.

peastman commented 2 years ago

I'm trying to track down a build failure. Perhaps you have insight into what the real problem is? My library uses PyTorch and links to libtorch.dylib. It does not directly use OpenBLAS, but apparently PyTorch does at some level, and I get an error trying to configure with CMake, telling me that libopenblas.dylib doesn't exist. Installing the PyTorch package installs the libopenblas package but not the openblas package. So what is the real problem?

One way or another, the current behavior is that if you install the PyTorch conda package and then use CMake to build something that links to libtorch, you get an error. What is the correct place to fix this?

isuruf commented 2 years ago

What's the error exactly?

peastman commented 2 years ago
make[2]: *** No rule to make target '/Users/peastman/miniconda3/envs/openmm/lib/libopenblas.dylib', needed by 'libOpenMMTorch.dylib'.  Stop.
make[1]: *** [CMakeFiles/Makefile2:190: CMakeFiles/OpenMMTorch.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
peastman commented 2 years ago

Does anyone have insight on what the correct fix is? I don't know whether the change should be in PyTorch, OpenBLAS, or CMake, but something needs to be changed somewhere.

isuruf commented 2 years ago

See https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1058489153

peastman commented 2 years ago

See https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1058528597 above. If a user simply installs PyTorch and tries to compile against it, they get a compilation error. The package includes libtorch, which contains the C++ API and is designed to be a library users link their own code to. Using the package in the intended way leads to errors. The fact that it uses OpenBLAS is an internal implementation detail of PyTorch, exactly the sort of thing conda is supposed to take care of for you.

Is there any good reason not to include the symlink in libopenblas? Omitting it creates problems for users. The version of the package on the defaults channel includes it.

peastman commented 2 years ago

Can anyone else comment on this? The problem is still not resolved. I believe the solution I originally proposed (including the alias in libopenblas) is the correct solution. I'm open to other possibilities, but no one has suggested a better one or pointed out any downside to including the alias. It's a tiny change that saves a lot of problems for users and makes the conda-forge version of the package consistent with the default channel version.

cc @jakirkham @h-vetinari

isuruf commented 2 years ago

Is there any good reason not to include the symlink in libopenblas?

Yes, to prevent downstream packages from accidentally linking to them. The symlink and headers are in the development package just like debian, fedora and countless other package managers. Are you suggesting that debian should package the symlink and the headers in the non dev package?

isuruf commented 2 years ago

For eg: see the difference between https://packages.debian.org/stretch/amd64/libopenblas-base/filelist and https://packages.debian.org/stretch/amd64/libopenblas-dev/filelist

peastman commented 2 years ago

Again, I'm happy to consider other options. But you still haven't suggested any solution at all to the problem. Let me repeat one more time exactly what the steps are.

  1. Install the pytorch package from CondaForge.
  2. Try to compile a program that links to libtorch.
  3. It fails with a linker error saying that libopenblas.dylib can't be found.

Notice that the user has not installed OpenBLAS. Until they hit this error message, they didn't even know that PyTorch used OpenBLAS. They certainly don't know there are two different packages related to OpenBLAS. Further, this error only happens with recent versions of PyTorch. With older versions the above steps work perfectly, because they require an older version of OpenBLAS than is available on CondaForge. Instead conda installs a package from the defaults channel which does include the alias.

Are you suggesting that debian should package the symlink and the headers in the non dev package?

What does this have to do with Debian? This is an error on Mac.

isuruf commented 2 years ago

But you still haven't suggested any solution at all to the problem.

Ask pytorch-cpu-feedstock maintainers to patch the cmake file. It should not be necessary to link to libopenblas.dylib.

What does this have to do with Debian? This is an error on Mac.

It's not unique to Mac. This doesn't manifest in Linux x86, because pytorch uses MKL in Linux x86.

Anyways, this is a pytorch-cpu-feedstock issue and not this feedstock.

peastman commented 2 years ago

Ask pytorch-cpu-feedstock maintainers to patch the cmake file.

This has nothing to do with a PyTorch CMake file. PyTorch simply calls find_package(BLAS). CMake has built in support for finding a variety of BLAS implementations. When it finds OpenBLAS on Mac it expects there to be a library called libopenblas.dylib. Every other way of installing OpenBLAS includes an alias with that name. The CondaForge package is unique in not including it. It is the one that is not following the convention. I already tried raising an issue with the CMake team and was explicitly told, "This is a bug in the conda package."

h-vetinari commented 2 years ago

I think the issue is that pytorch has a host-dependency on openblas (hence, with the development headers, library alias, etc.), but the openblas output doesn't have it's own run_export (and therefore just inherits the one from libopenblas).

I think this setup is actually correct from the openblas side, but it sounds like the pytorch feedstock should add a runtime-dependency on openblas; or alternatively, add a devel package (that's suitable for being compiled against) that includes it.

CC @conda-forge/pytorch-cpu

peastman commented 2 years ago

Can any of the PyTorch package maintainers comment on this?

hmaarrfk commented 2 years ago

I think @h-vetinari is correct. PRs welcome.

isuruf commented 2 years ago

No, @h-vetinari is not correct. The correct way is to patch pytorch's cmake files to remove the dependence on libopenblas.dylib.

isuruf commented 2 years ago

Or make pytorch link to the blas library as PRIVATE instead of PUBLIC.

peastman commented 2 years ago

The correct way is to patch pytorch's cmake files to remove the dependence on libopenblas.dylib.

Why do you keep saying that? PyTorch's CMake files do not refer to libopenblas. All it does is call find_package(BLAS), and CMake locates a BLAS implementation for it.

isuruf commented 2 years ago

All it does is call find_package(BLAS), and CMake locates a BLAS implementation for it.

Yes, but it links to BLAS as a public dependency instead of a private dependency. See https://github.com/pytorch/pytorch/blob/e5bf87963d65d6273a6ec3505a50c07229643606/cmake/Dependencies.cmake#L194. I suggest you start reading https://cmake.org/cmake/help/latest/command/target_link_libraries.html

Please read up on how CMake target files work and have a look at /lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Targets.cmake file.

isuruf commented 2 years ago

PyTorch's CMake files do not refer to libopenblas.

This is clearly wrong. Have a look at lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Targets.cmake and also read https://cmake.org/cmake/help/latest/command/export.html#exporting-targets to get an understanding.

peastman commented 2 years ago

@hmaarrfk what do you think is the right solution? I'm happy to make a PR. I just need to know what it should change.

hmaarrfk commented 2 years ago

I think isuruf is right (he has very good instincts on this) .

I think you have to hunt down a cmake file that has the word PUBLIC in it where it shouldn't.

Not very easy to find.

jakirkham commented 2 years ago

When it comes to this kind of hunting, git grep does a lot better that GitHub search IME

hmaarrfk commented 2 years ago

Don't we have the package you are looking for

https://github.com/conda-forge/openmm-feedstock

Can you follow that recipe?

jakirkham commented 2 years ago

Peter maintains that feedstock IIUC. The issue is PyTorch (upstream) has something wonky in its CMake files AIUI

peastman commented 2 years ago

I've been looking into implementing this. The more I investigate it, the less comfortable I am with the proposed solution.

Here is the relevant PyTorch CMake script. It builds two lists called Caffe2_DEPENDENCY_LIBS and Caffe2_PUBLIC_DEPENDENCY_LIBS. Later in the build process, a different script links anything added to the former as private and anything added to the latter as public. As part of building the list, it looks for many different BLAS implementations and whichever one it finds, it adds the relevant library to Caffe2_PUBLIC_DEPENDENCY_LIBS. This works for all implementations other than OpenBLAS. For example, the MKL conda package includes aliases for all its libraries with no need to install a separate dev package. And of course it also works for any OpenBLAS package other than the CondaForge one.

I could create a patch to the script so that OpenBLAS_LIB would get added to Caffe2_DEPENDENCY_LIBS instead of Caffe2_PUBLIC_DEPENDENCY_LIBS. That might work, though I haven't tested it. But it raises a number of issues.

Instead we could trivially and robustly fix the problem by just adding one alias to the OpenBLAS package which is already present in other distributions of OpenBLAS, including the corresponding package on the conda defaults channel.

peastman commented 2 years ago

Do you agree with my assessment?

peastman commented 2 years ago

@hmaarrfk do you agree?

hmaarrfk commented 2 years ago

I can't respond at this moment, but if you think there is something wrong with pytorch the build on OSX, can you open an issue there and copy your assessment?

Generally speaking, people don't test their build systems in something exactly like conda. The state of the parent machine might be making a bug in their build system.

It wouldn't be the first time we propose patches to upstream projects improving their build system.

peastman commented 2 years ago

if you think there is something wrong with pytorch the build on OSX, can you open an issue there and copy your assessment?

I don't think there's anything wrong with pytorch. I think the problem is with OpenBLAS. See my analysis above. Can we please reopen this issue?

hmaarrfk commented 2 years ago

I'll reopen, but isuruf's original answer seems correct: image

mamba install openblas

should address your challenges.

hmaarrfk commented 2 years ago

https://anaconda.org/conda-forge/openblas/files

peastman commented 2 years ago

Correct in what sense? Can someone please respond to the analysis I posted above? I carefully laid out why trying to patch the PyTorch build script was a bad solution, and why the trivial change I originally suggested is both more robust and more consistent with the rest of the ecosystem. That was over three weeks ago, and not a single person has responded to anything I said.

h-vetinari commented 2 years ago

This works for all implementations other than OpenBLAS. For example, the MKL conda package includes aliases for all its libraries with no need to install a separate dev package.

It could also be a cmake issue that it's not looking for a versioned libopenblas. MKL similarly has a devel package in conda-forge as well that's very similar to the openblas one.

I don't think the patch is as fragile as you say, and maybe it's even acceptable upstream.

All that said, I don't feel super strongly about all this, but with Isuru's strong opposition to changing this (and his experience & standing on these topics), it looks like you're fighting an uphill battle on this. I'd recommend you try the pytorch patch, and perhaps submit it upstream (or figure out why cmake is not picking up on the libopenblas that's present).

hmaarrfk commented 2 years ago

But the library alias should exist. In your analysis you assume it doesn't. However, osx+arm is currently depending at build time on both openblas and libopenblas

https://github.com/conda-forge/pytorch-cpu-feedstock/blob/93333de760d702aee276dcafeedc890458a00417/recipe/meta.yaml#L85

hmaarrfk commented 2 years ago

Your assumption that the alias doesn't exist is wrong.

Isn't it?

If not please add a quick test to our CI befofe the cmake command

ls -lah $PREFIX/lib
test -f $PREFIX/lib/libopenblas.dynlib 
hmaarrfk commented 2 years ago

From the osx+arm+python 3.9 log:

2022-04-27T01:48:33.5362930Z   -- Found OpenBLAS libraries: /Users/runner/miniforge3/conda-bld/pytorch-recipe_1651023559589/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/libopenblas.dylib

or for ease of reading

2022-04-27T01:48:33.5362930Z   -- Found OpenBLAS libraries: $PREFIX/lib/libopenblas.dylib

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=498044&view=logs&j=1b8be447-c2bd-5772-b66a-a1146441bf88&t=6d156886-9c65-5a39-5edc-896ebd7da557

hint 1. Click view raw log. 2. Then search The azure search always hurts my head.
peastman commented 2 years ago

Your assumption that the alias doesn't exist is wrong.

Isn't it?

Download the package and see for yourself. I'm not talking about what happens while building the PyTorch package. I'm talking about what happens when you install PyTorch to use its C++ API, and try to compile your own code that links to it:

You're somehow supposed to figure out on your own that there exists a package called openblas, that it's different from the libopenblas package that was automatically installed with PyTorch, somehow track down what the difference between the two packages is (the names give no clue), and finally realize that you need to manually install that second package if you want PyTorch to work correctly.

with Isuru's strong opposition to changing this (and his experience & standing on these topics), it looks like you're fighting an uphill battle on this.

Isuru has not offered a single reason why it would be harmful to add the alias, nor addressed the fact that the alias is present in other comparable packages (such as MKL, or the defaults channel version of libopenblas), nor pointed out any advantage of his suggested change which would be far more complicated, far more work, and far more fragile. In fact, he hasn't replied to anything I've said for nearly two months.

hmaarrfk commented 2 years ago

Unfortunately, i really don't have an Mac M1. I am unable to download and see for myself. A log file showing your error would go a long way.

Generally speaking: a one line patch to pytorch is not "hacking" and would likely get accepted upstream. Generally speaking: from an engineering standpoint, upstream isn't always interested in "integration in an ecosystem" as a first priority. So we often have to do small fixes like one line patches to build systems to get them to work with conda. Generally speaking: it is common to ship the bare .so files in a different package to ensure that software doesn't accendentallg link to them. This seems to be true even in other distributions like ubuntu or Fedora where devel or dev packages exist.

hmaarrfk commented 2 years ago

We have also worked alot with different upstream ~package providers~ software authors to help them improve their build systems.

They are, generally speaking, really interested in getting people to use their software. They just don't really test for "conda" at first.

Edit: package provider -> software authors

peastman commented 2 years ago

I just don't understand where you're coming from. Can you give one single disadvantage to including the alias? Is there any possible use case where it would cause problems? If not, then why is there any disagreement?

And let me emphasize again, this package is the outlier that's violating the conventions of the rest of the ecosystem. If you install the same package from the defaults channel, it includes the alias. If you install a different BLAS from conda-forge, such as MKL, it includes all necessary aliases. But this one doesn't, and that omission causes major problems for users.

Unfortunately, i really don't have an Mac M1. I am unable to download and see for myself.

Just download https://anaconda.org/conda-forge/libopenblas/0.3.20/download/osx-64/libopenblas-0.3.20-openmp_hb3cd9ec_0.tar.bz2. Expand it and see for yourself that it contains a library called libopenblas.0.dylib, but not an alias called libopenblas.dylib pointing to it.

Now download https://anaconda.org/anaconda/libopenblas/0.3.20/download/osx-64/libopenblas-0.3.20-h9a5756b_1.tar.bz2, which is the exact same version of the same package from the defaults channel, and see that it does include the alias.

A log file showing your error would go a long way.

See https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1058681071 above.

ocefpaf commented 2 years ago

I just don't understand where you're coming from. Can you give one single disadvantage to including the alias? Is there any possible use case where it would cause problems?

You probably missed https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1083482954

peastman commented 2 years ago

I saw it. It just doesn't make any sense. No downstream package will link to it unless it's specifically trying to link to it. Linking to a library is not something that happens "accidentally". Every library you link to has to be specified by name. And let me emphasize yet again, this is how every other comparable package works. Including the same package from the default channel. Including other BLAS implementations from conda-forge.

hmaarrfk commented 2 years ago

Let me try to recap in a constructive way.

You asked for help, from volunteers, without providing any logs claiming that something was broken. Instead you pointed to how "defaults" does it.

There are reasons why you are choosing conda-forge. Some of the design decisions behind how conda-forge is built is the reason why you find this attractive.

Now, you had 3 core members of conda-forge respond to you in a consistent manner.

In fact, two of those are actively involved in building large packages like pytorch.

In https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1093084389 and https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1093142065 isuru pointed you exactly to the line of code that you would have to suggest changing upstream, and for conda-forge that would address your issues.

I, a maintainer of the pytorch feedstock, would personally accept such a patch, and help you justify it upstream.

We aren't saying we are infallible, but in this case, you got some pretty direct feedback about how to solve this particular issue. Please try to make an attempt to resolve it.

peastman commented 2 years ago

@hmaarrfk I've been trying very hard to be constructive through this whole discussion, but I'm also finding it incredibly frustrating. It began with Isuru basically dismissing my suggestions with a vague reference to Debian, an argument that made no sense and displayed a lack of understanding of the deep differences between the dependency models in Linux and Python. I nonetheless made a sincere effort to implement the solution he suggested, but quickly concluded it just didn't make sense. I wrote up a technical discussion of the problems with it, which was met with complete silence. It took over three weeks and multiple further posts to get anyone to reply, and when they did, I was basically told, "Isuru doesn't like it so it isn't going to happen." Meanwhile, Isuru has not replied to a single thing I've posted for two months, which makes a productive discussion entirely impossible.

I am 100% willing to do the work to fix the problem. But no one seems interested in what is obviously the correct fix, instead referring back to the earlier suggestion which has serious problems that I already explained in detail. Every time I try to discuss the technical reasons for preferring it, I am met with silence. Looking back over the discussion, I count no less than four times when that happened.

You asked for help, from volunteers, without providing any logs claiming that something was broken.

I did post it, over two months ago. And I provided another link back to it just yesterday.

I am trying really, really hard to fix a serious problem that is impacting my work. I am willing to do the work to fix it. If you say you're willing to accept a PR adding the missing alias, I'll do it immediately and we can close this issue as resolved. But instead I feel I'm just getting ignored, and it's very upsetting.

hmaarrfk commented 2 years ago

I'm not really sure if your assessment of the timeline is correct. isuruf replied in a very timely manner at the beginning of the issue.

isuruf likely has the most complete understanding of compilers, conda, and conda-forge of those i've interacted with.

It began with Isuru basically https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1083482954 with a vague reference to Debian

The internet can be dry at times. The balance between "brief" answers and "basically dismissing" is thin. However, he promptly replied to questions.

It took over three weeks and multiple further posts to get anyone to reply

Let me justify why I didn't reply. When asked for additional information, you provided 3 lines of a log. Often it is really important how you setup your information. When asked for the "exact" problem, often it is really valuable to provide full CMake Logs. We could have been more explicit about this request, but the level of experience compiling things yourself often indicates that you are comfortable communicating programming with fully reproducible logs.

I was basically told, "Isuru doesn't like it so it isn't going to happen."

Again. many of us trust isuru's judgement on these things.

Isuru has not replied to a single thing I've posted for two months, which makes a productive discussion entirely impossible

Again, nobody has to reply. if somebody feels like they cannot add anything constructive to the discussion, they often refrain from typing.

obviously the correct fix

I don't think any solution obvious. We often work collaboratively asking others for input and feedback.

I did https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1058681071, over two months ago. And I provided https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1146066152 just yesterday.

These logs are seriously incomplete. For an example on how to ask for external help, please see https://github.com/google/highway/issues/726 where I included as an attachment, log information going all the way back to early cmake commands and the installation of dependencies. These are hard to make without CIs, but you can likely start at when you type

mkdir build
cd build
cmake ..
[...]

and provide us all that information.

I am trying really, really hard to fix a serious problem that is impacting my work.

As much as we want to help, we often won't be fast enough for a mission critical work. Its really up to you, as an engineer, to decide on the risk factors you want to choose and how you want to build your software dependencies.

The infrastructure at conda-forge is flexible enough to allow you to build and upload your own packages. This REALLY helps decouple problems.

If you say you're willing to accept a PR adding the missing alias,

I don't think I can accept a PR to openblas-feedstock since those involved in maintaining it expressed that they do not agree with the assessment that any files are missing from the released packages.

But instead I feel I'm just getting ignored

You've been pointed in the right direction in https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1093142065

For what its worth, isuru's comments lead me directly to a small patch: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116

which I've asked for feedback from upstream developers too https://github.com/pytorch/pytorch/pull/78883

In summary:

As a maintainer of the pytorch feedstack, and having studied the suggestions made by isuru, i agree that it is a bug in pytorch, and thus, in the pytorch feedstock. I'm working with other maintainers to address it. And thus I'm going to close this issue.

It may take a few weeks to finish the rebuild given how difficult it is to build pytorch on CIs.

peastman commented 2 years ago

When asked for additional information, you provided 3 lines of a log.

What I provided is exactly what I was asked for. Isuru's exact request was, "What's the error exactly?" I provided the exact error. At no point in the entire discussion did anyone ever say, "Your error message was insufficient, please post more details."

I'm not really sure if your assessment of the timeline is correct. isuruf replied in a very timely manner at the beginning of the issue.

I posted the above on May 3. More than two weeks later, on May 18, still no one had replied. I therefore posted again asking for any insight--keeping in mind I had provided all the information I had been asked for up to that point. He replied with a two word message, "See #134", linking back to an earlier one sentence message from before he asked me for the error and before I provided it.

Let me justify why I didn't reply. When asked for additional information, you provided 3 lines of a log.

Tell me precisely what information you want, and I'll be glad to provide it.

Again. many of us trust isuru's judgement on these things.

If he were participating in this discussion that might be a valid point. However, he has not said a single thing in two months, since before I posted a detailed analysis of why his suggested solution was not correct. That was nearly a month ago, and still not a single person has responded to a single one of the technical issues I raised. Not one. There is no discussion going on here. I keep trying to discuss technical issues, and all my attempts are completely ignored. If Isuru wants to join in the technical discussion, he's welcome to. If he doesn't want to that's his choice too. But his choice to ignore the discussion does not mean anything has been resolved.

You've been pointed in the right direction in https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1093142065

And I replied with https://github.com/conda-forge/openblas-feedstock/issues/134#issuecomment-1121489856 explaining why that was not, in fact, the right direction. Do you agree with the technical points I raised? Disagree? If you disagree, please say so and explain why. If you agree, please say that.

For what its worth, isuru's comments lead me directly to a small patch:

I strongly recommend against that patch. It is not the right solution. It is based on a misunderstanding of the problem. I've already explained the reasons why. I could keep repeating them, or expand on them further if you want to understand the actual issues here. What you're doing there is a band-aid to paper over the fact that the OpenBLAS package is broken and violates the conventions of the Python ecosystem. In the process, it changes the binary interface for libtorch which risks breaking downstream software.

ngam commented 2 years ago
  • This changes the binary interface for libtorch, which risks breaking other software that uses it.

I could keep repeating them, or expand on them further if you want to understand the actual issues here.

Could you explain this binary interface part to me? (As in, explaining how it would change it and thus risk breaking software using it)

--

And, if you want an even more trivial solution, you could follow the very first response in the issue:

It's in openblas package along with the headers.

peastman commented 2 years ago

And, if you want an even more trivial solution, you could follow the very first response in the issue:

It's in openblas package along with the headers.

That's the problem. it's not the solution!

Could you explain this binary interface part to me? (As in, explaining how it would change it and thus risk breaking software using it)

Consider a downstream C++ code that links to libtorch, and also invokes BLAS routines directly. PyTorch supports many different BLAS implementations, and ensures that an appropriate one will be installed. Its build scripts contain lots of logic to locate different implementations and link to them correctly. You could repeat all that work in your own build system, but why do that when PyTorch has already done it for you? When you link to libtorch, that's guaranteed to bring in an appropriate BLAS that you can invoke with no further effort.

The proposed change would break that. It changes BLAS from public to private. Any package that expects it to be public would no longer compile.