Closed ngam closed 2 years ago
Hi! This is the friendly automated conda-forge-linting service.
I wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found some lint.
Here's what I've got...
For recipe:
meta.yaml
, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint .
from the recipe directory. @conda-forge-admin, please rerender
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you but ran into some issues. Please check the output logs of the latest rerendering GutHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or try re-rendering locally.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/jaxlib-feedstock/actions/runs/2316028997.
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
@conda-forge-admin, please rerender
@conda-forge-admin, please rerender
@xhochy, any insight? zlib is available
GCC_HOST_COMPILER_PATH=/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin/x86_64-conda-linux-gnu-cc \
PATH=/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_build_env/bin:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
PWD=/proc/self/cwd \
TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_62,sm_70,sm_72,sm_75,compute_75 \
TF_CUDA_PATHS=/usr/local/cuda,/home/conda/feedstock_root/build_artifacts/jaxlib_1652393390396/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac \
TF_CUDA_VERSION=10.2 \
TF_CUDNN_VERSION=7 \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.d '-frandom-seed=bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.o' -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt/bin/external/com_google_protobuf/src -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections '-fvisibility=hidden' -Wno-sign-compare -Wno-stringop-truncation -Wno-array-parameter '-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir.' '-std=c++17' -DHAVE_PTHREAD -DHAVE_ZLIB -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function -Wno-write-strings -c external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc -o bazel-out/k8-opt/bin/external/com_google_protobuf/_objs/protobuf/gzip_stream.o)
# Configuration: 41636a5ea5a6dcdab0c1dc1d3a7463df20519d9901b8a3611f22ea526d627843
# Execution platform: @local_execution_config_platform//:platform
In file included from external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc:38:0:
external/com_google_protobuf/src/google/protobuf/io/gzip_stream.h:49:10: fatal error: zlib.h: No such file or directory
#include <zlib.h>
^~~~~~~~
compilation terminated.
Target //build:build_wheel failed to build
INFO
zlib was the tip of the iceberg, cuda builds weren't seeing any of the tf_sys_libs. Trying to copy settings from tf to see if we could resolve that...
I believe this is the problem:
WARNING: option '--config=cuda' (source command line options) was expanded and now overrides the explicit option --crosstool_top=//bazel_toolchain:toolchain with --crosstool_top=@local_config_cuda//crosstool:toolchain
@wolfv if you have a moment to look, I'm trying to finish what you started...
I believe this is the problem:
WARNING: option '--config=cuda' (source command line options) was expanded and now overrides the explicit option --crosstool_top=//bazel_toolchain:toolchain with --crosstool_top=@local_config_cuda//crosstool:toolchain
This is indeed the problem. Patch incoming, pending tests passing...
Alright, let's goooooo
@xhochy could you please review when you have a moment? It is really messy for now, but I will try to organize this as much as possible. I describe the main problem above as well as in google/jax issue. Maybe you could find a more elegant solution for this? IDK, but this seems to work. Our GPU package is almost the same size as their cuda pypi wheel... but we need to test, I am really not sure this worked correctly.
jaxlib 0.3.7 cuda112py38ha4793f1_0 ngam
>>> import jax
>>> import jax.numpy as jnp
>>> from jax import grad, jit, vmap
>>> from jax import random
>>> key = random.PRNGKey(0)
>>> x = random.normal(key, (10,))
>>> print(x)
[-0.3721109 0.26423115 -0.18252768 -0.7368197 -0.44030377 -0.1521442
-0.67135346 -0.5908641 0.73168886 0.5673026 ]
>>> x
DeviceArray([-0.3721109 , 0.26423115, -0.18252768, -0.7368197 ,
-0.44030377, -0.1521442 , -0.67135346, -0.5908641 ,
0.73168886, 0.5673026 ], dtype=float32)
>>> from jax.lib import xla_bridge
>>> print(xla_bridge.get_backend().platform)
gpu
>>>
MAIN TODOS BEFORE MERGE:
@conda-forge-admin, please rerender
CI passes, either timeouts or lost connections remain. The issue is that sometimes 6 hours is not enough since this is really very borderline, so I am not sure what we should do --- we can always rerun the CI a few times and then see if any still not uploaded and upload the manually.
But anyway, that's almost done!
note: artifacts available here if people want them: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=506049&view=artifacts&pathAsName=false&type=publishedArtifacts
@conda-forge-admin, please restart ci
@conda-forge/jaxlib (+ @ocefpaf) this is ready. I am fairly certain everything is fine. However, I would encourage you to review and merge the other cleaner PR. I am leaving this up with all the edits and commits so that the thought/debug process is clearer.
I initially wanted to submit another PR for 0.3.10, but it is better to wait for now. Also, the 0.3.10 build didn't actually work on a local GPU, so I am not really sure what's going on. Anyway, in the cleaner PR, I drop cuda 11.0 and cuda 11.1. Note though, there seems to be a fundamental problem with 10.2 (I never compiled that locally, so idk). It seems to me building on 11.2 is enough, but I will leave this up to you to decide.
Alright, dropping everything but 11.2 for now. Build time brought down significantly with unvendoring nccl. All ready in the other PR.
@ngam This looks good but quite messy. I think the first approach would be to clean things up a bit like you already described in some comments and then I would have a look again. I've also read that you don't need the custom toolchain, that would be great. Otherwise, we should move tensorflow
over to bazel-toolchain
. There are some slight differences currently but we can have both on the same. The main issue with moving tensorflow over is that it just takes ages to iterate on tensorflow
.
@ngam This looks good but quite messy. I think the first approach would be to clean things up a bit like you already described in some comments and then I would have a look again. I've also read that you don't need the custom toolchain, that would be great. Otherwise, we should move
tensorflow
over tobazel-toolchain
. There are some slight differences currently but we can have both on the same. The main issue with moving tensorflow over is that it just takes ages to iterate ontensorflow
.
Please refer to the other PR. I will close this soon
Closing this in favor of #100
fixes #34
closes #72
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)