bazel-contrib / rules_cuda

Starlark implementation of bazel rules for CUDA.
https://bazel-contrib.github.io/rules_cuda/
MIT License
92 stars 43 forks source link

How to use cublas in a non-root bazel module? #238

Closed appthumb closed 3 months ago

appthumb commented 7 months ago

I'm using rules_cuda in a bazel MODULE A, and some of my cuda_library needs to link with -lcubas and -lcublasLt.

Naturally, I'm defining local_cuda like examples/cublas/BUILD.bazel. . In my MODULE.bazel of module A:

bazel_dep(
    name = "rules_cuda",
    version = "0.2.1",
)

cuda = use_extension("@rules_cuda//cuda:extensions.bzl", "toolchain")
cuda.local_toolchain(
    name = "local_cuda",
    toolkit_path = "",
)
use_repo(cuda, "local_cuda")

And in my BUILD.bazel, I have the cuda_library target:

cuda_library(
    name = ...,
    srcs = ...,
    deps = [
       "@local_cuda//:cublas",
    ],
    ...
)

This works fine when I build my module A. However, when I have another bazel module B that depends on module A, I cannot build module B, because the local_cuda can only be declared in a root module. I got this error:

ERROR: Traceback (most recent call last): File "/private/var/tmp/_bazel_dev/67de6cda420db4eb86e6ad3f1fd2b6e4/external/rules_cuda~/cuda/extensions.bzl", line 15, > column 21, in _init fail("Only the root module may override the path for the local cuda toolchain") Error in fail: Only the root module may override the path for the local cuda toolchain

This is from this line: https://github.com/bazel-contrib/rules_cuda/blob/8f2f2e6d64d38e46d09538c921304c7c902a2564/cuda/extensions.bzl#L15.

Is it possible to use rules_cuda in a bazel module that other modules can depend? I don't really need to customize my cuda path, as the default path works fine with me. Is there a way to avoid the above error?

cloudhan commented 7 months ago

With current impl, it is not possible.

But it is very reasonable to let upstream projects expose targets that depend on rules_cuda, say, kernels wrapped in a c library with c public interfaces. Current project and downstream projects should be able to use it without pain.

I'll see how we should improve the situation.

appthumb commented 7 months ago

Thanks for looking into this! Really appreciated.

Do you mean that my upstream module A uses cuda_library internally, and exposes it with cc_library, and my downstream module B depends on it?

This doesn't seem to work -- it looks as long as I use cuda_library rule in module A, I need to add local_cuda to the MODULE.bazel of module A, and this forces A to be a top-level module.

I tried to remove everything that's referring to local_cuda in module A, so in A's MODULE.bazel file I only have:

bazel_dep(
    name = "rules_cuda",
    version = "0.2.1",
)

and I use coda_library in A's BUILD.bazel file:

load("@rules_cuda//cuda:defs.bzl", "cuda_library")

cuda_library(
    name = "kernel",
    srcs = ["kernel.cu"],
    hdrs = ["kernel.h"],
)

then A cannot compile by bazel. bazel build kernel gives this error:

Analysis of target '//my_project:kernel' failed; build aborted: module extension "toolchain" from "@@rules_cuda~//cuda:extensions.bzl" does not generate repository "local_cuda", yet it is imported as "local_cuda" in the usage at https://bcr.bazel.build/modules/rules_cuda/0.2.1/MODULE.bazel:10:26

This is referring to https://github.com/bazel-contrib/rules_cuda/blob/8f2f2e6d64d38e46d09538c921304c7c902a2564/MODULE.bazel#L10

Adding local_cuda to A's MODULE.bazel would make bazel compile A, but then module B cannot depend on it.

appthumb commented 7 months ago

Oh, didn't see a fix is in the making! Looking forward to it 👍

jsharpe commented 6 months ago

I think its possible to just do:

cuda = use_extension("@rules_cuda//cuda:extensions.bzl", "toolchain")
use_repo(cuda, "local_cuda")

in B's MODULE.bazel which will make local_cuda available to B - its not ideal but I think this works - I have something similar in one of my projects.

appthumb commented 6 months ago

yes, this would work to compile B. The problem is that this won't compile A if A has some cuda_library target. This can be annoying, e.g., all the compilation and testing of the cuda code in A now has to be done through module B.

jsharpe commented 6 months ago

You would leave your A module with the use_repo that you had above. the above snippet makes local_cuda visible in B and you can use cuda_library in A or B.

appthumb commented 6 months ago

Thanks for your response! I kind of get it work for my purpose, by using different MODULE.bazel files in my local bazel registry and in the repo_a. Here's a summary of what I have found so far.

My setup:

Now this is the complete file content of bazel_registry/modules/repo_a/1.0.0/MODULE.bazel:

"""repo A."""

module(
    name = "repo_a",
    version = "1.0.0",
)

bazel_dep(
    name = "bazel_skylib",
    version = "1.5.0",
)

bazel_dep(
    name = "rules_cc",
    version = "0.0.9",
)

bazel_dep(
    name = "rules_cuda",
    version = "1.0.0",
)

cuda = use_extension("@rules_cuda//cuda:extensions.bzl", "toolchain")
use_repo(cuda, "local_cuda")

This is the complete content of repo_b/MODULE.bazel:

"""repo_b"""

module(
    name = "repo_b",
    version = "1.0.0",
)

bazel_dep(
    name = "bazel_skylib",
    version = "1.5.0",
)

bazel_dep(
    name = "rules_cc",
    version = "0.0.9",
)

bazel_dep(
    name = "repo_a",
    version = "1.0.0",
)

bazel_dep(
    name = "rules_cuda",
    version = "1.0.0",
)

cuda = use_extension("@rules_cuda//cuda:extensions.bzl", "toolchain")

cuda.local_toolchain(
    name = "local_cuda",
    toolkit_path = "",
)
use_repo(cuda, "local_cuda")

This works, and bazel build under repo_b succeeds. Note:

Now repo_b works fine, but I also want repo_a to work, i.e., doing bazel build under repo_a should succeed. If I use the exact content of bazel_registry/modules/repo_a/1.0.0/MODULE.bazel as repo_a/MODULE.bazel, I will get the following error when I bazel build under repo_a:

failed; build aborted: module extension "toolchain" from "@@rules_cuda~//cuda:extensions.bzl" does not generate repository "local_cuda", yet it is imported as "local_cuda" in the usage at /home/dev/temp/cuda_test/repo_a/MODULE.bazel:23:21

To workaround this, I need to use a slightly different MODULE.bazel content under repo_a. This is the complete content of repo_a/MODULE.bazel:

"""repo A."""

module(
    name = "repo_a",
    version = "1.0.0",
)

bazel_dep(
    name = "bazel_skylib",
    version = "1.5.0",
)

bazel_dep(
    name = "rules_cc",
    version = "0.0.9",
)

bazel_dep(
    name = "rules_cuda",
    version = "1.0.0",
)

cuda = use_extension("@rules_cuda//cuda:extensions.bzl", "toolchain")

cuda.local_toolchain(
    name = "local_cuda",
    toolkit_path = "",
)
use_repo(cuda, "local_cuda")

Note that I add cuda.local_toolchain in it. This makes bazel build under repo_a work without any issue. This won't break the build under repo_b, since the latter uses a different MODULE.bazel file for repo_a.

So far I got both repo_a and repo_b work, by leveraging different MODULE.bazel files between the one in my local bazel registry, and the one in the actual repo, to bypass the requirement that toolchains must be defined at the top-level module.

Not sure if this is the canonical way of setting up local dependencies, and it feels like a hack. It will be nice if we can remove the limitation of toolchains declaration, and that avoids all these tricky situations and the MODULE.bazel files won't have to diverge between the one in the repo vs. the one in the bazel registry.

(PS: I'm using the head version of rules_cuda in this GitHub repository. The rules_cuda in Bazel Central Registry https://registry.bazel.build/modules/rules_cuda is 5 months behind the head version here. To avoid confusion I point to the head version of rules_cuda in my local bazel registry as well, and this is why you can see the version of rules_cuda is 1.0.0 above. I also tried the published version 0.2.1, and the result is the same).