NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
MIT License
867 stars 143 forks source link

add autotuning support #25

Open drossetti opened 6 years ago

drossetti commented 6 years ago

optimized memcpy implementations should be chosen at run-time during a tuning phase, possibly in gdr_open()

maddyscientist commented 6 years ago

Curious: what dimensions are you going to be tuning over here in the autotuner?

drossetti commented 6 years ago

@maddyscientist that is a good question. I am not expecting a dependency on the buffer size, but I might be wrong.

hongbilu commented 1 year ago

@drossetti BTW, is there any calculation formula, otherwise that would depend on experimental values on kinds of HW configuration

drossetti commented 1 year ago

@hongbilu any performance model would be HW dependent inherently, so it would involve maintaining a database of FOMs for each platform. That is why I was proposing a run-time autotuning phase instead.

hongbilu commented 1 year ago

@drossetti that would be a big work and cpu's work frequency or workload also need to be considered in theory. Experiments show that cpu's work frequency is a key influence factor