conda-forge / jaxlib-feedstock

A conda-smithy repository for jaxlib.
BSD 3-Clause "New" or "Revised" License
16 stars 24 forks source link

DO NOT MERGE, add cuda support #97

Closed ngam closed 2 years ago

ngam commented 2 years ago

fixes #34

closes #72

Checklist

conda-forge-linter commented 2 years ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

ngam commented 2 years ago

@conda-forge-admin, please rerender

github-actions[bot] commented 2 years ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the latest rerendering GutHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or try re-rendering locally.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/jaxlib-feedstock/actions/runs/2316028997.

conda-forge-linter commented 2 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

ngam commented 2 years ago

@conda-forge-admin, please rerender

ngam commented 2 years ago

@conda-forge-admin, please rerender

ngam commented 2 years ago

@xhochy, any insight? zlib is available

    GCC_HOST_COMPILER_PATH=/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin/x86_64-conda-linux-gnu-cc \
    PATH=/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
    TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_62,sm_70,sm_72,sm_75,compute_75 \
    TF_CUDA_PATHS=/usr/local/cuda,/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac \
    TF_CUDA_VERSION=10.2 \
    TF_CUDNN_VERSION=7 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.d '-frandom-seed=bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.o' -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt/bin/external/com_google_protobuf/src -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections '-fvisibility=hidden' -Wno-sign-compare -Wno-stringop-truncation -Wno-array-parameter '-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir.' '-std=c++17' -DHAVE_PTHREAD -DHAVE_ZLIB -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function -Wno-write-strings -c external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc -o bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.o)
# Configuration: 41636a5ea5a6dcdab0c1dc1d3a7463df20519d9901b8a3611f22ea526d627843
# Execution platform: @local_execution_config_platform//:platform
In file included from external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc:38:0:
external/com_google_protobuf/src/google/protobuf/io/gzip_stream.h:49:10: fatal error: zlib.h: No such file or directory
 #include <zlib.h>
          ^~~~~~~~
compilation terminated.
Target //build:build_wheel failed to build
INFO
ngam commented 2 years ago

zlib was the tip of the iceberg, cuda builds weren't seeing any of the tf_sys_libs. Trying to copy settings from tf to see if we could resolve that...

I believe this is the problem:

WARNING: option '--config=cuda' (source command line options) was expanded and now overrides the explicit option --crosstool_top=//bazel_toolchain:toolchain with --crosstool_top=@local_config_cuda//crosstool:toolchain
ngam commented 2 years ago

@wolfv if you have a moment to look, I'm trying to finish what you started...

ngam commented 2 years ago

I believe this is the problem:

WARNING: option '--config=cuda' (source command line options) was expanded and now overrides the explicit option --crosstool_top=//bazel_toolchain:toolchain with --crosstool_top=@local_config_cuda//crosstool:toolchain

This is indeed the problem. Patch incoming, pending tests passing...

ngam commented 2 years ago

Alright, let's goooooo

@xhochy could you please review when you have a moment? It is really messy for now, but I will try to organize this as much as possible. I describe the main problem above as well as in google/jax issue. Maybe you could find a more elegant solution for this? IDK, but this seems to work. Our GPU package is almost the same size as their cuda pypi wheel... but we need to test, I am really not sure this worked correctly.

jaxlib                    0.3.7           cuda112py38ha4793f1_0    ngam
>>> import jax
>>> import jax.numpy as jnp
>>> from jax import grad, jit, vmap
>>> from jax import random
>>> key = random.PRNGKey(0)

>>> x = random.normal(key, (10,))
>>> print(x)
[-0.3721109   0.26423115 -0.18252768 -0.7368197  -0.44030377 -0.1521442
 -0.67135346 -0.5908641   0.73168886  0.5673026 ]
>>> x
DeviceArray([-0.3721109 ,  0.26423115, -0.18252768, -0.7368197 ,
             -0.44030377, -0.1521442 , -0.67135346, -0.5908641 ,
              0.73168886,  0.5673026 ], dtype=float32)
>>> from jax.lib import xla_bridge
>>> print(xla_bridge.get_backend().platform)
gpu
>>> 
ngam commented 2 years ago

MAIN TODOS BEFORE MERGE:

ngam commented 2 years ago

@conda-forge-admin, please rerender

ngam commented 2 years ago

CI passes, either timeouts or lost connections remain. The issue is that sometimes 6 hours is not enough since this is really very borderline, so I am not sure what we should do --- we can always rerun the CI a few times and then see if any still not uploaded and upload the manually.

But anyway, that's almost done!

note: artifacts available here if people want them: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=506049&view=artifacts&pathAsName=false&type=publishedArtifacts

ngam commented 2 years ago

@conda-forge-admin, please restart ci

ngam commented 2 years ago

@conda-forge/jaxlib (+ @ocefpaf) this is ready. I am fairly certain everything is fine. However, I would encourage you to review and merge the other cleaner PR. I am leaving this up with all the edits and commits so that the thought/debug process is clearer.

100 (0.3.7)

I initially wanted to submit another PR for 0.3.10, but it is better to wait for now. Also, the 0.3.10 build didn't actually work on a local GPU, so I am not really sure what's going on. Anyway, in the cleaner PR, I drop cuda 11.0 and cuda 11.1. Note though, there seems to be a fundamental problem with 10.2 (I never compiled that locally, so idk). It seems to me building on 11.2 is enough, but I will leave this up to you to decide.

ngam commented 2 years ago

Alright, dropping everything but 11.2 for now. Build time brought down significantly with unvendoring nccl. All ready in the other PR.

xhochy commented 2 years ago

@ngam This looks good but quite messy. I think the first approach would be to clean things up a bit like you already described in some comments and then I would have a look again. I've also read that you don't need the custom toolchain, that would be great. Otherwise, we should move tensorflow over to bazel-toolchain. There are some slight differences currently but we can have both on the same. The main issue with moving tensorflow over is that it just takes ages to iterate on tensorflow.

ngam commented 2 years ago

@ngam This looks good but quite messy. I think the first approach would be to clean things up a bit like you already described in some comments and then I would have a look again. I've also read that you don't need the custom toolchain, that would be great. Otherwise, we should move tensorflow over to bazel-toolchain. There are some slight differences currently but we can have both on the same. The main issue with moving tensorflow over is that it just takes ages to iterate on tensorflow.

Please refer to the other PR. I will close this soon

ngam commented 2 years ago

Closing this in favor of #100