Closed MKfri closed 1 year ago
Hello. This is an interesting use case.
Are you using it to evaluate GPU execution in a mixed case with a balanced operational intensity? Or just the pure compute throughput?
Hello, the plan is to use it as a pure compute throughput benchmark if there are multiple available GPUs. The most performant device is then selected. In addition, the results will be cached so this gets executed as few times as possible, ideally only once or perhaps every time the drivers get updated or a new device is installed.
We are looking into this to relieve the user from selecting the device himself and also not needing to resort to hacks, such as estimating the number of cores based on device name to compute the theoretical FLOPS (see: https://github.com/ProjectPhysX/OpenCL-Wrapper#for-comparison-the-very-same-opencl-vector-addition-example-looks-like-this-when-directly-using-the-opencl-c-bindings).
With a proof of concept, we have concluded that we can afford the overhead since total program runtime is significant and relatively not affected by a big margin. Caching aspect is currently not implemented yet. Before adding proof of concept to the codebase, we would like to solve the question of licencing.
Best regards, MK
Okay, I understand. My question is why don't you leverage a few iterations of the simulation itself for benchmarking? This would better represent the type of workload that will be executed eventually and it won't require any additional kernels.
Hello, that is certainly an option and I agree that it would be a better indicator of performance compared to a synthetic benchmark. On the other hand, we are mainly trying to automatically select the GPU that is not the integrated GPU, therefore, an estimate of performance should be sufficient. The main reason for the stated approach is that selecting device to use in advance is a simpler and more maintainable design. I believe that is neccessery in our program since it suffers enough from overcomplicated design decisions. Also, data needs to be transfered between devices, which is not a negligible overhead in terms of time and complexity of implementation.
Finally, I would like to say that I appreciate the suggestions and would like to point out that you are welcome to say no to the question about allowing us to use the kernel under LGPLv3.
Best regards, MK
I understand but at the moment, I wouldn't like changing the license model.
As an alternative you may consider https://github.com/krrishnarraj/clpeak, which applies a more flexible license.
Hello, I am adding support for OpenCL in an open-source LGPLv3 licenced molecular docking simulation program. I have adapted OpenCL kernel (mix_kernels.cl) for measuring the gflops performance to serve as a microbenchmark for selecting the most performant OpenCL capable device on systems where both iGPU and discrete GPUs are available.
Our repository is under LGPLv3 since there is support for linking a proprietery library and therefore the licence cannot be changed to GPL. Since the GPLv2 is not compatible with LGPLv3 and combining LGPLv3 with GPLv2+ makes the whole repository GPLv3 I am asking if it is possible to either dual licence the entire codebase under LGPLv3 and GPLv2 (current) or for permission to use the derived mix_kernels.cl kernel under LGPLv3 within our codebase.
Best regards, MK