conda-forge / wandb-feedstock

A conda-smithy repository for wandb.
BSD 3-Clause "New" or "Revised" License
2 stars 14 forks source link

Bump version to `0.18.1` + Add `cargo` build #129

Open flferretti opened 1 week ago

flferretti commented 1 week ago

Checklist

Fix #126 Fix #119 Fix #120 Fix #111 Fix #108 Fix #104

github-actions[bot] commented 1 week ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

github-actions[bot] commented 1 week ago

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

github-actions[bot] commented 1 week ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

timoffex commented 6 days ago

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=1037156&view=logs&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=6d4b912b-175d-51da-0fd9-4d30fe1eb4e7&l=1189

File "/home/conda/feedstock_root/build_artifacts/wandb_1727267483895/work/nvidia_gpu_stats/hatch.py", line 49, in build_nvidia_gpu_stats
    source_path.replace(output_path)
[...]
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia_gpu_stats/target/release/nvidia_gpu_stats' -> 'wandb/bin/nvidia_gpu_stats'

Sounds like the nvidia_gpu_stats binary didn't get built, but I'm not sure why. The Rust compiler runs and outputs a warning, but the last message makes it sound like the binary gets generated anyway:

     Compiling nvidia_gpu_stats v0.1.2 (/home/conda/feedstock_root/build_artifacts/wandb_1727267483895/work/nvidia_gpu_stats)
  warning: unused return value of `must_use` that must be used
    --> src/gpu_nvidia.rs:23:9
     |
  23 | /         format!(
  24 | |             "{}.{}",
  25 | |             nvml_wrapper::cuda_driver_version_major(cuda_version),
  26 | |             nvml_wrapper::cuda_driver_version_minor(cuda_version)
  27 | |         );
     | |_________^
     |
     = note: `#[warn(unused_must_use)]` on by default
     = note: this warning originates in the macro `format` (in Nightly builds, run with -Z macro-backtrace for more info)

  warning: `nvidia_gpu_stats` (bin "nvidia_gpu_stats") generated 1 warning
      Finished `release` profile [optimized] target(s) in 3m 23s

CC @dmitryduev since you wrote this :)

flferretti commented 4 days ago

Thanks for reporting the error @timoffex! You're right, the compiler doesn't throw an error, and the verbose logs aren't very helpful in diagnosing the problem. I'm considering whether we should add cuda-nvml-dev to the host requirements, as it looks required here, and skip the CUDA build when the CUDA compiler isn't available. What do you think?

timoffex commented 3 days ago

I'm considering whether we should add cuda-nvml-dev to the host requirements, as it looks required here, and skip the CUDA build when the CUDA compiler isn't available. What do you think?

If I understand correctly, it shouldn't be necessary---that line of code checks whether a library exists at runtime (when someone is running wandb), and if it doesn't, then nvidia_gpu_stats acts like a no-op. The Rust build itself shouldn't depend on CUDA.

timoffex commented 3 days ago

Looks like the build almost works on OSX and Windows, but it tries to run cargo-bundle-licenses in core/ rather than nvidia_gpu_stats/. You might consider replacing cd core && ... in the build script by pushd core && ... && popd.

Still not sure what's happening on Linux!

conda-forge-admin commented 3 days ago
        Hi! This is the friendly automated conda-forge-linting service.

        I wanted to let you know that I linted all conda-recipes in your             PR (```recipe/meta.yaml```) and found some lint.

        Here's what I've got...

For recipe/meta.yaml:

conda-forge-admin commented 3 days ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

flferretti commented 2 days ago

Thanks for your help @timoffex! OSX and Windows builds are successful now. I'm still trying to find out what is happening on Linux

timoffex commented 1 day ago

I think OSX and Windows are successful because they are actually not building nvidia_gpu_stats. That's expected on OSX, but it should be built on Windows, so something is going wrong there.

I'm going to try to repro using build-locally.py.

timoffex commented 12 hours ago

I reproduced the error on Linux. It looks like the nvidia_gpu_stats binary is not landing in the target/release/ folder, but rather in target/x86_64-unknown-linux-gnu/release/ folder (at least on my machine; I'm guessing something similar is happening in CI). CC @dmitryduev

According to https://doc.rust-lang.org/cargo/guide/build-cache.html#build-cache, this happens if the --target flag is specified (it's not) or the build.target / CARGO_BUILD_TARGET setting is set (we don't do this, but maybe the setting comes from elsewhere). I'll have to see if there's a more reliable way to get the output path.

I wasn't able to test on Windows as apparently build-locally.py doesn't support that. I think better logging in wandb's build scripts will help diagnose the problem.