Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
3.02k stars 115 forks source link

Single source for both CPU and GPU code possible? #49

Open gzz2000 opened 2 years ago

gzz2000 commented 2 years ago

Hello. Thanks for this awesome project. I can now compile CUDA kernels in rust into ptx in one cargo package, and use them in another package. Now I wonder whether it is possible to write both the kernels and the CPU code within one package, or even one rust source file. For example, it might look like this:

// shared code between cpu and gpu. struct definitions may also be shared
// similar to CUDA's __device__ __host__
#[devicehost like thing...]
pub fn adder_both_cpu_gpu(a: f32, b: f32) -> f32 {
    a + b
}

#[kernel]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
    let idx = thread::index_1d() as usize;
    if idx < a.len() {
        let elem = &mut *c.add(idx);
        *elem = adder_both_cpu_gpu(a[idx], b[idx]);
    }
}

fn main() {
    // use kernel fn add
}

Unfortunately, this seems not possible at this moment. However, I think it is purely a matter of some convenient macros. For example, if we define the kernel macro to, instead of directly mark the function as kernel, launch a separate cargo build with cuda_builder, and replace the CPU code with a lazy_static ptx module import. This way it would be easier to manage dependency and reuse between CPU code and GPU code and save some boilerplates. I don't know if you are interested.. Though it might be harder than I think:(

rcarson3 commented 2 years ago

I'd actually asked about this on the science-and-ai rust discord channel in regards to a no-std crate. I'm just rehashing some of this here but @RDambrosio016 seemed to suggest it might be possible if you made use of #[cfg(target...)] for certain things that were specific to cuda kernel calling and made use of per-target dependencies. Although, it seemed like there might be issues in-regards to the build.rs file deadlocking itself due to a cyclic-dependency. Although, one of the oxide-enzyme team members seemed to suggest that if you checked whether or not you were on your first or second invocation of the build.rs you could get around that issue. They did this by creating a file during the first invocation and using that as a check.

Although, I haven't had the time to check to see if what was proposed there would work as I've still been working on the designs of the library.

beepster4096 commented 2 years ago

Couldn't you also check the CARGO_CFG_TARGET_ARCH var in the build script to avoid the deadlock?

RDambrosio016 commented 2 years ago

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

gzz2000 commented 2 years ago

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

Thanks for your reply. I just came across an abandoned project accel. We can see that their code have been much similar than what we discussed as a single source code file, using procedural macros. I have no idea how they implemented that and if that means compiling the code twice.

jac-cbi commented 2 years ago

StupidQ: would declaring the function as inline cause it to get built twice, once for each target?

RDambrosio016 commented 2 years ago

Technically yes, but that will still abide by the cpu target's cfg stuff which causes problems. You would also need another custom codegen that derefs to cg_llvm for some things and nvvm for others. And that would moreover probably require forking cg_llvm to make some things public... its a mess

jac-cbi commented 2 years ago

Seems like a good place for a function macro then