Open kjetilkjeka opened 2 years ago
Parsing ptx is really hard because it has a lot of instructions, so its not really viable rn, however, i've thought of #[kernel]
exposing a hidden function to check the function launch safety. This would mean we would have a launch_rust_kernel!
function or something since it would require that the kernels crate be used as a crate.
Parsing ptx is really hard because it has a lot of instructions, so its not really viable rn
We would only need to parse entries and parameter lists. Starting with .visible .entry
and stopping at the )
after the parameter list.
however, i've thought of #[kernel] exposing a hidden function to check the function launch safety.
I don't think that would be possible. With every new PTX ISA major version it is possible for Nvidia to break the kernel abi. It will not be easy for the #[kernel] macro to know the ISA version rustc/llvm is compiling for, and just as hard to get every possible version consistent between compilers and the #[kernel] macro.
I don't think that would be possible. With every new PTX ISA major version it is possible for Nvidia to break the kernel abi. It will not be easy for the #[kernel] macro to know the ISA version rustc/llvm is compiling for, and just as hard to get every possible version consistent between compilers and the #[kernel] macro.
no it doesnt generate a ptx function it generates a rust function which can be run from the cpu
no it doesnt generate a ptx function it generates a rust function which can be run from the cpu
I assume this Rust function would have to call cuLaunch
. But the parameter array of ptrs being passed to cuLaunch
depends on the ptx kernel abi. And the ptx kernel abi depends on the PTX version. And it is not easy to figure out the PTX version without parsing the .ptx/fatbin file, and if you are already parsing the .ptx you could just as well look at the kernel param list directly.
I think this would be a good idea if PTX had a stable ABI across major versions. Unfortunately it seems like it doesn't have that.
no it doesnt generate a ptx function it generates a rust function which can be run from the cpu
I assume this Rust function would have to call
cuLaunch
. But the parameter array of ptrs being passed tocuLaunch
depends on the ptx kernel abi. And the ptx kernel abi depends on the PTX version. And it is not easy to figure out the PTX version without parsing the .ptx/fatbin file, and if you are already parsing the .ptx you could just as well look at the kernel param list directly.
No it would just check if the sizes of the arguments is correct, if its not it panics, otherwise it launches. This would generate something which checks a list of say, type sizes against the list of type sizes in the function, it has nothing to do with ptx
No it would just check if the sizes of the arguments is correct, if its not it panics, otherwise it launches. This would generate something which checks a list of say, type sizes against the list of type sizes in the function, it has nothing to do with ptx
I believe old PTX required to do zero/sign extension up to 32bits for uint16_t
and uint8_t
before calling from host. More importantly, there are no guarantees that these things won't change between PTX versions. This means that checking the parameter list in the kernel have arguments with identical size (on host) as the arguments as the host "launch-site" is not enough. As the actual PTX abi can start require zero/sign extensions or generally start treating some types differently whenever Nvidia increases the major version for the PTX version.
The combination of
cuLaunch
requiring getting the argument list exactly right and major version changes of PTX ISA can change the argument list makes it close to impossible of being certain that a kernel is launched correctly using the driver API. One way to improve upon this would be to parse the ptx/fatbin for kernel argument list and verify the rust source against it. This would not be possible for cubin, but ptx/fatbin should cover most use cases.There are basically two approaches to this. The runtime approach would be to add function argument list info to
Cust::module::Module
which is generated when ptx/fatbin is added to the module. Then it would be passed to thecust::function::Function
when it is created.The other alternative is a static/
build.rs
alternative. We could create a dev-dependency that parses a .ptx and create rust types containing information about the argument list layout. A generic method onModule
would then return the specific type implementing someKernel
trait and theStream
would have a genericlaunch
function that would accept some<Function as Kernel>::ArgList
argument.I'm not promising to implement this, but would love to get some feedback on the idea.