ekondis / mixbench

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
GNU General Public License v2.0
363 stars 65 forks source link

Use of mix_kernel.cl under LGPLv3 #39

Closed MKfri closed 1 year ago

MKfri commented 1 year ago

Hello, I am adding support for OpenCL in an open-source LGPLv3 licenced molecular docking simulation program. I have adapted OpenCL kernel (mix_kernels.cl) for measuring the gflops performance to serve as a microbenchmark for selecting the most performant OpenCL capable device on systems where both iGPU and discrete GPUs are available.

Our repository is under LGPLv3 since there is support for linking a proprietery library and therefore the licence cannot be changed to GPL. Since the GPLv2 is not compatible with LGPLv3 and combining LGPLv3 with GPLv2+ makes the whole repository GPLv3 I am asking if it is possible to either dual licence the entire codebase under LGPLv3 and GPLv2 (current) or for permission to use the derived mix_kernels.cl kernel under LGPLv3 within our codebase.

Best regards, MK

ekondis commented 1 year ago

Hello. This is an interesting use case.

Are you using it to evaluate GPU execution in a mixed case with a balanced operational intensity? Or just the pure compute throughput?

MKfri commented 1 year ago

Hello, the plan is to use it as a pure compute throughput benchmark if there are multiple available GPUs. The most performant device is then selected. In addition, the results will be cached so this gets executed as few times as possible, ideally only once or perhaps every time the drivers get updated or a new device is installed.

We are looking into this to relieve the user from selecting the device himself and also not needing to resort to hacks, such as estimating the number of cores based on device name to compute the theoretical FLOPS (see: https://github.com/ProjectPhysX/OpenCL-Wrapper#for-comparison-the-very-same-opencl-vector-addition-example-looks-like-this-when-directly-using-the-opencl-c-bindings).

With a proof of concept, we have concluded that we can afford the overhead since total program runtime is significant and relatively not affected by a big margin. Caching aspect is currently not implemented yet. Before adding proof of concept to the codebase, we would like to solve the question of licencing.

Best regards, MK

ekondis commented 1 year ago

Okay, I understand. My question is why don't you leverage a few iterations of the simulation itself for benchmarking? This would better represent the type of workload that will be executed eventually and it won't require any additional kernels.

MKfri commented 1 year ago

Hello, that is certainly an option and I agree that it would be a better indicator of performance compared to a synthetic benchmark. On the other hand, we are mainly trying to automatically select the GPU that is not the integrated GPU, therefore, an estimate of performance should be sufficient. The main reason for the stated approach is that selecting device to use in advance is a simpler and more maintainable design. I believe that is neccessery in our program since it suffers enough from overcomplicated design decisions. Also, data needs to be transfered between devices, which is not a negligible overhead in terms of time and complexity of implementation.

Finally, I would like to say that I appreciate the suggestions and would like to point out that you are welcome to say no to the question about allowing us to use the kernel under LGPLv3.

Best regards, MK

ekondis commented 1 year ago

I understand but at the moment, I wouldn't like changing the license model.

As an alternative you may consider https://github.com/krrishnarraj/clpeak, which applies a more flexible license.