conda-forge / conda-forge-ci-setup-feedstock

A conda-smithy repository for conda-forge-ci-setup.
BSD 3-Clause "New" or "Revised" License
13 stars 53 forks source link

Install libcuda stub into QEMU sysroot for testing in emulation #252

Closed h-vetinari closed 1 year ago

h-vetinari commented 1 year ago

Follow-up to #250, where I originally wanted to copy but then we decided to just not fail. However, the tests in https://github.com/conda-forge/arrow-cpp-feedstock/pull/1120 show that the stubs are still necessary in QEMU.

superseded Seeing that the script is already copying `compat/*` for CUDA 11... https://github.com/conda-forge/conda-forge-ci-setup-feedstock/blob/17fd53570288308428591cd8c6b04c95a8a70367/recipe/cross_compile_support.sh#L114-L115 ... it looks like copying the [content](https://github.com/conda-forge/cuda-compat-feedstock/blob/fda8055cff9ec301681da0bcbf0b2f4efbcf7f57/recipe/meta.yaml#L55-L64) of `cuda-compat` should be the right equivalent for the new CUDA 12 setup. Last three commits will be removed again if the CI passes.
conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

isuruf commented 1 year ago

Stubs by definition are not supposed to work.

isuruf commented 1 year ago

How is this done for native builds?

isuruf commented 1 year ago

Also, the qemu env is used only for test phase where compilers are not present. When compilers are present QEMU_LD_PREFIX is set to the env where the compilers are present. How do you handle that case?

h-vetinari commented 1 year ago

Also, the qemu env is used only for test phase where compilers are not present. When compilers are present QEMU_LD_PREFIX is set to the env where the compilers are present. How do you handle that case?

I only need it for the test phase when in emulation. The regular compilation ran fine with the current setup already.

h-vetinari commented 1 year ago

How is this done for native builds?

Presumably through the cuda-version run-export somehow? At least, the various feedstocks which already build natively for CUDA 12 are finding libcuda.so during testing. I don't know QEMU well enough but it seems to be looking elsewhere (you added the copy to QEMU_LD_PREFIX in https://github.com/conda-forge/conda-forge-ci-setup-feedstock/commit/ae8eb4eda3d5c9fee3b4711ebe59ee33e7cd0430, so I was trying to emulate that - no pun intended).

h-vetinari commented 1 year ago

@isuruf, I now realized that we very likely don't have to do the manual copying we're doing for CUDA 11 -- as the location for QEMU is another conda environment anyway, we can just install the compat-libs there? Could you confirm that approach is reasonable?

isuruf commented 1 year ago

Presumably through the cuda-version run-export somehow? At least, the various feedstocks which already build natively for CUDA 12 are finding libcuda.so during testing.

Can you find out how this is done natively? We should emulate that.

I don't know QEMU well enough but it seems to be looking elsewhere (you added the copy to QEMU_LD_PREFIX in https://github.com/conda-forge/conda-forge-ci-setup-feedstock/commit/ae8eb4eda3d5c9fee3b4711ebe59ee33e7cd0430, so I was trying to emulate that - no pun intended).

For CUDA 11, for non-QEMU, libcuda.so.1 is found in /usr/local/lib in the docker image. Since it is not there in CUDA 12, non-QEMU, I'm just wondering where it is in native builds.

h-vetinari commented 1 year ago

@isuruf: How is this done for native builds?

@h-vetinari: Presumably through the cuda-version run-export somehow? At least, the various feedstocks which already build natively for CUDA 12 are finding libcuda.so during testing.

@isuruf: Can you find out how this is done natively? We should emulate that.

Looking at the testing phase for the last CI run for cupy against CUDA 12, I see:

    cuda-cudart:              12.0.107-h59595ed_6        conda-forge
    cuda-cudart_linux-64:     12.0.107-h59595ed_6        conda-forge
    cuda-driver-dev:          12.0.107-h59595ed_6        conda-forge
    cuda-driver-dev_linux-64: 12.0.107-h59595ed_6        conda-forge
    cuda-nvrtc:               12.0.76-h59595ed_1         conda-forge
    cuda-nvtx:                12.0.76-hcb278e6_0         conda-forge
    cuda-version:             12.0-hffde075_2            conda-forge
    cupy:                     12.1.0-py310hfc31588_0     local

There's a lib/stubs/libcuda.so in cuda-driver-dev, which is also explicitly added as a test dependency in the recipe, with the comment # need the libcuda stub for import test

Perhaps @leofang @adibbley @jakirkham can comment on how finding libcuda.so at test time should be handled more generally...

h-vetinari commented 1 year ago

Will check if this works in https://github.com/conda-forge/arrow-cpp-feedstock/pull/1120. I'm also going to reactivate #236 whether we want to run CI for the CUDA cross-compilation setup (even though we don't need the builds...), we can continue there if someone wants to change how we deal with cuda-compat vs. cuda-driver-dev

h-vetinari commented 1 year ago

So it turns out that the stub is not enough (see https://github.com/conda-forge/arrow-cpp-feedstock/pull/1120). I've also tried adding cuda-driver-dev and cuda-compat to the test environment explicitly, but that didn't make a difference.

isuruf commented 1 year ago

Can we revert this then?

h-vetinari commented 1 year ago

I'll happily revert this once we know how we want to deal with this. In the meantime, my thinking was: it doesn't hurt anyone and I wanted to avoid further churn.

If you want I can open a dedicated issue here. Currently I have this comment which summarizes the situation from my POV - we should solve the libcuda-at-test-time divergence in a way that doesn't require every feedstock to add spurious test dependencies.

isuruf commented 1 year ago

You merged this PR despite my review to test things out and since it doesn't work, reverting is the courteous thing to do.

h-vetinari commented 1 year ago

No problem, if that's your preference (previously & elsewhere, you had been sensitive to churn): #254

You merged this PR despite my review to test things out and since it doesn't work

I did what you had asked (do it as elsewhere for CUDA 12), and to test it out I needed to merge it. Arrow is also the first feedstock (to my knowledge) trying to cross-compile with the CUDA 12 infrastructure, so there is nothing that can break from changes to this particular branch.

isuruf commented 1 year ago

No, you didn't check CUDA 12 with arrow-cpp. I'm pretty sure, linux-64 native builds with CUDA 12 will break as well.

h-vetinari commented 1 year ago

This PR didn't change anything about native compilation

isuruf commented 1 year ago

Sure, that's why I think this is the wrong place to fix an issue you are seeing because the problem you are seeing is not limited to cross compilation.

h-vetinari commented 1 year ago

because the problem you are seeing is not limited to cross compilation.

That could very well be. It did not immediately appear like that to me though, as the native cuda 12 migration had been running for a couple weeks now, and the arch one hasn't been started yet (there's also few feedstocks cross-compiling cuda at all, since that capability is quite recent).

My interest with this PR was purely in unblocking the cross-compiling case, without touching (and possibly disturbing) the native case.

leofang commented 1 year ago

Perhaps @leofang @adibbley @jakirkham can comment on how finding libcuda.so at test time should be handled more generally...

At least in CuPy/cuQuantum feedstocks, my finding is that we need LD_PRELOAD or dlopen to load the stub file prior to import:

Otherwise, the linker would complain. This is unique to the CUDA 12 builds, where we no longer have the stubs searchable in the path; CUDA 11 builds (which, as Isuru mentioned, uses the docker image) work fine.

My interest with this PR was purely in unblocking the cross-compiling case, without touching (and possibly disturbing) the native case.

@adibbley or @jakirkham can correct me, but last time I checked CUDA 12 for aarch64/ppc64le is not ready yet. I am not sure if the linux64 packages are enough for cross compiling? (I guess not, but I need to recollect the reason.)

leofang commented 1 year ago

xref: https://github.com/conda-forge/cuda-cudart-feedstock/issues/5