Open LaurentCabaret opened 1 year ago
To keep in mind: even if we manage to solve the LP on the GPU, there are a few other operations currently done on the CPU.
The first that comes in mind is partition_models_by_accuracy
, which uses the Introselect algorithm as implemented in the standard C++ function std::nth_element. We'll have to investigate if this function can or cannot operate on a subset of the data to reduce transfers between host and device memory.
We'll also have to investigate all other operations that are currently done on the CPU.
https://hal.science/hal-01149739/file/4538a179.pdf