Please, don't drop GPU support

vanadiuz commented 4 years ago

Dear developers,

I didn't find a dedicated email address to write about it, so I write here.

GPU support is one of the main Espresso's superpower:

To help us make an informed decision, please let us know, if you are currently using any of these methods and roughly what kind of systems you are looking at:

number of particles

ca. 60000, 10% out of them are dipolar

volume fraction

10%

active methods (electrostatics, magnetostatics, lattice Boltzmann,

Dip. P3M (with DLC) or Dip. Direct Sum on GPU + Lattice Boltzmann (GPU)+ WCA + FENE + Harmonic bonds

electrokinetics, virtual sites, ...)

Potentially needed

how many time steps in a simulation

~8e5

how many simulations

~1k

what is the relative importance of time to solution for a single simulation compared to the entire bunch of simulations in a project to you? (Note: Dropping GPU support will likely increase the time to solution of a single simulation. On the other hand, compute time on GPUs is often not as readily available as compute time on pure CPU systems. It may therefore be possible to run more simulations in parallel if GPUs are not required.)

The ability to use GPU changes the whole game. Simulations work (could be sometimes) at least 10 times faster (not 2-3 times, as stated). Also Dip. Direct Sum on CPU does not support MPI.

what GPUs do you have access to, and how many?

mostly gtx1080ti (~70 GPUs) and several gtx2080ti.

Also

The ratio of CPU core to GPU is typically 10:1 to 20:1. For systems with less than 100k particles, Espresso will neither use the GPU nor the CPU cores efficiently.

Lattis Boltzmann for box 100x100x100 eats about 300 MB of GPU memory (which usually has about 8 GB). So, having 20 cores and 1 GPU, I can run 20 simulations (1sim. for 1 core) and almost fully load my single GPU.

Best regards, Vania

RudolfWeeber commented 4 years ago

Thanks for the response to the survey.

What exactly is the system in which you have a factor of 10 between running on a GPU or a CPU?

Against how many MPI processes are you comparing?

The mentioned factor of 2-3 was for an LB simulation on 8 MPI processes.

vanadiuz commented 4 years ago

Dear Rudolf,

In my case, such a small performance was due to the fact that I used Dip. Direct Sum on CPU in the simulation, with which MPI does not work. Therefore, it was only one process. I apologize for the slurred presentation of my experience.

Without dip. direct sum on eight MPI processes, performance is indeed about only two times slower.

Once again, I carefully read your letter "Future of GPU support in Espresso". In case, when the double-precision dip. direct sum will be MPI-parallelized on the CPU, I will have no argument left supporting the benefit of the GPU.

Thank you, Vania

espressomd / espresso

Please, don't drop GPU support #3724