Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
3.16k stars 120 forks source link

Idea: parse argument list of kernels for safer launching #65

Open kjetilkjeka opened 2 years ago

kjetilkjeka commented 2 years ago

The combination of cuLaunch requiring getting the argument list exactly right and major version changes of PTX ISA can change the argument list makes it close to impossible of being certain that a kernel is launched correctly using the driver API. One way to improve upon this would be to parse the ptx/fatbin for kernel argument list and verify the rust source against it. This would not be possible for cubin, but ptx/fatbin should cover most use cases.

There are basically two approaches to this. The runtime approach would be to add function argument list info to Cust::module::Module which is generated when ptx/fatbin is added to the module. Then it would be passed to the cust::function::Function when it is created.

The other alternative is a static/build.rs alternative. We could create a dev-dependency that parses a .ptx and create rust types containing information about the argument list layout. A generic method on Module would then return the specific type implementing some Kernel trait and the Stream would have a generic launch function that would accept some <Function as Kernel>::ArgList argument.

I'm not promising to implement this, but would love to get some feedback on the idea.

RDambrosio016 commented 2 years ago

Parsing ptx is really hard because it has a lot of instructions, so its not really viable rn, however, i've thought of #[kernel] exposing a hidden function to check the function launch safety. This would mean we would have a launch_rust_kernel! function or something since it would require that the kernels crate be used as a crate.

kjetilkjeka commented 2 years ago

Parsing ptx is really hard because it has a lot of instructions, so its not really viable rn

We would only need to parse entries and parameter lists. Starting with .visible .entry and stopping at the ) after the parameter list.

however, i've thought of #[kernel] exposing a hidden function to check the function launch safety.

I don't think that would be possible. With every new PTX ISA major version it is possible for Nvidia to break the kernel abi. It will not be easy for the #[kernel] macro to know the ISA version rustc/llvm is compiling for, and just as hard to get every possible version consistent between compilers and the #[kernel] macro.

RDambrosio016 commented 2 years ago

I don't think that would be possible. With every new PTX ISA major version it is possible for Nvidia to break the kernel abi. It will not be easy for the #[kernel] macro to know the ISA version rustc/llvm is compiling for, and just as hard to get every possible version consistent between compilers and the #[kernel] macro.

no it doesnt generate a ptx function it generates a rust function which can be run from the cpu

kjetilkjeka commented 2 years ago

no it doesnt generate a ptx function it generates a rust function which can be run from the cpu

I assume this Rust function would have to call cuLaunch. But the parameter array of ptrs being passed to cuLaunch depends on the ptx kernel abi. And the ptx kernel abi depends on the PTX version. And it is not easy to figure out the PTX version without parsing the .ptx/fatbin file, and if you are already parsing the .ptx you could just as well look at the kernel param list directly.

I think this would be a good idea if PTX had a stable ABI across major versions. Unfortunately it seems like it doesn't have that.

RDambrosio016 commented 2 years ago

no it doesnt generate a ptx function it generates a rust function which can be run from the cpu

I assume this Rust function would have to call cuLaunch. But the parameter array of ptrs being passed to cuLaunch depends on the ptx kernel abi. And the ptx kernel abi depends on the PTX version. And it is not easy to figure out the PTX version without parsing the .ptx/fatbin file, and if you are already parsing the .ptx you could just as well look at the kernel param list directly.

No it would just check if the sizes of the arguments is correct, if its not it panics, otherwise it launches. This would generate something which checks a list of say, type sizes against the list of type sizes in the function, it has nothing to do with ptx

kjetilkjeka commented 2 years ago

No it would just check if the sizes of the arguments is correct, if its not it panics, otherwise it launches. This would generate something which checks a list of say, type sizes against the list of type sizes in the function, it has nothing to do with ptx

I believe old PTX required to do zero/sign extension up to 32bits for uint16_t and uint8_t before calling from host. More importantly, there are no guarantees that these things won't change between PTX versions. This means that checking the parameter list in the kernel have arguments with identical size (on host) as the arguments as the host "launch-site" is not enough. As the actual PTX abi can start require zero/sign extensions or generally start treating some types differently whenever Nvidia increases the major version for the PTX version.