KomputeProject / kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
http://kompute.cc/
Apache License 2.0
1.99k stars 154 forks source link

[Discussion/Suggestion] Enable the writing of kernel/shader code in directly in Python. #394

Open LouChiSoft opened 2 months ago

LouChiSoft commented 2 months ago

Hi, first off I would like to state I don't know enough about Python to know if this is actually possible. But I would like to suggest a potential feature improvement. The ability to write kernels/shaders directly in Python and have it compile down to the compute string that you would normally write would make for a decent improvement I think.

Maybe having a Kernel class that the user can inherit and provides a more structured approach to declaring things like inputs by making them the arguments of a process function or maybe member values of the class itself.

Example based on the Getting Started kernel:

from .utils import compile_source # using util function from python/test/utils

def kompute(shader):
    // Definition left out to save space

if __name__ == "__main__":

    # Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header
    # files). This shader shows some of the main components including constants, buffers, etc
    shader = """
        #version 450

        layout (local_size_x = 1) in;

        // The input tensors bind index is relative to index in parameter passed
        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };
        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };
        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };
        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };

        // Kompute supports push constants updated on dispatch
        layout(push_constant) uniform PushConstants {
            float val;
        } push_const;

        // Kompute also supports spec constants on initalization
        layout(constant_id = 0) const float const_one = 0;

        void main() {
            uint index = gl_GlobalInvocationID.x;
            out_a[index] += uint( in_a[index] * in_b[index] );
            out_b[index] += uint( const_one * push_const.val );
        }
    """

    kompute(shader)

Would become:

from .utils import compile_source

class GettingStartedKernel(KomputeKernel):

    def process(in_a, in_b, out_a, out_b, push_const, const_one):
        index: int = get_global_index().x
        out_a[index] += in_a[index] * in_b[index] ;
        out_b[index] += const_one * push_const.val;

if __name__ == "__main__":
    mgr = kp.Manager()

    tensor_in_a = mgr.tensor([2, 2, 2])
    tensor_in_b = mgr.tensor([1, 2, 3])

    tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))
    tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))

    push_constants = PushConstants(2)
    spec_constnatnts = 2

    my_kernel = GettingStartedKernel()
    mgr.execute(my_kernel, [3, 1, 1], tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b, push_constants, spec_constants)

This is by no means meant to be a "correct" solution. Just something to express the idea that I am trying to describe. It's obviously not a trivial feature to implement and there are certain things that would need to be addressed first. But I think that having something that is more than just a string can be more productive when writing. Ideally it would also take away all the hassle of having to manually check and ensure things like your bindings indices and set indices etc.

Would love to hear some feedback on the idea/if it's even possible in Python.

axsaucedo commented 2 months ago

We actually had this in a previous version of Kompute using the pyshader - here's an example on the tests:

https://github.com/KomputeProject/kompute/blob/df5477a2d76b920232811e8513579240467ad673/python/test/test_array_multiplication.py#L19-L35

The library is not maintained unfortunately, and there hasn't been anything out there to provide a similar interface unfortunately, if there is an initiative that develops this further, it would be great to adopt once again.

LouChiSoft commented 1 month ago

Thanks for the link. Shame pyshader is no longer actively maintained. In a perfect world I would be able to write entire pipelines once in Python and AOT compile it with something like PyPy to an executable with both CPU and GPU pipelines.