3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 197 forks source link

Relion 3 - Beta - CPU Only? #378

Closed ifelsefi closed 5 years ago

ifelsefi commented 6 years ago

Hi

Our users would like us to try out Relion 3.

I am seeing the following statement:

Parts of the cryo-EM processing pipeline can be very computationally demanding, and in some cases special hardware can be used to make these faster. There are two such cases at the moment;

Since RELION-2: Use one or more PGUs, or graphics cards. RELION only supports CUDA-capable GPUs of compute capabilty 3.5 or higher.

Since RELION-3: Use the vectorized version. RELION only supports GCC and ICC 2018.3 or later. There are more benefits than speed; the accelearated versions also have a decreased memory footprint. Details about how to enable either of these options is listed below.

Does this mean that Relion 3 only supports CPU but not GPU?

If so does this mean we can see better performance on AVX512 CPU than V100? Can we see benchmarks?

eriklindahl commented 6 years ago

Hi Douglas,

"Since" is inclusive of later versions, so RELION-3 will certainly use NVIDIA GPUs too (in fact, it should be faster than RELION-2 - although we're still tuning a couple things more during the beta)

Cheers,

Erik

On Thu, Aug 2, 2018 at 2:31 PM Douglas Duckworth notifications@github.com wrote:

Hi

Our users would like us to try out Relion 3 https://bitbucket.org/scheres/relion-3.0_beta.git.

I am seeing the following statement:

`Parts of the cryo-EM processing pipeline can be very computationally demanding, and in some cases special hardware can be used to make these faster. There are two such cases at the moment;

Since RELION-2: Use one or more PGUs, or graphics cards. RELION only supports CUDA-capable GPUs of compute capabilty 3.5 or higher.

Since RELION-3: Use the vectorized version. RELION only supports GCC and ICC 2018.3 or later. There are more benefits than speed; the accelearated versions also have a decreased memory footprint. Details about how to enable either of these options is listed below.`

Does this mean that Relion 3 only supports CPU but not GPU?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/3dem/relion/issues/378, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFpejkj-9N_heyjzr3tXeTnRdoVsmj2ks5uMvEGgaJpZM4VsMaD .

-- Erik Lindahl erik.lindahl@dbb.su.se Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm University Science for Life Laboratory, Box 1031, 17121 Solna, Sweden

ifelsefi commented 6 years ago

Thank you!

We are buying more GPUs so want to make sure that's not a waste.

Will try Beta3 now...

ifelsefi commented 5 years ago

Hi

After reading this article it seems that GPU and CPU Relion 3 will branch entirely while AVX-512 instructions mean CPU has compelling advantage considering box size limitations and price of professional grade GPUs.

However can you tell me if Relion uses Cuda for FFT acceleration as I have heard it does not on Relion listserv. I am asking since V-100 Tensor Cores are great at matrix multiplication thus FFT. Moreover, does Relion leverage Cuda Unifed Memory Architecture which would allow oversubscribing GPU memory using system memory.

bforsbe commented 5 years ago

With regards to Cuda-FFT (CUFFT):

We use them when possible. Because a full-size output each iteration will perform a full-size FFT, maximization-associated FFTs do not use Cuda, because any large-box classification/refinement would fail immediately (iteration 1). We could try a Cuda-FFT and revert to non-Cuda when it's too big, but this feature has not made it into relion yet. For any other FFT (as those low-passed (cropped) to the current resolution in expectation ops), relion uses Cuda-FFTs.

With regards to unifying memory;

Unifying the memory of multiple GPUs may benefit the final iterations of large-box refinements, but the gain has not merited the level of effort necessary to make this happen. We asses that using the new CPU-acceleration, reverting to non-GPU execution whenever necessary is a more reasonable solution. Unfortunately this is still according to the "run on GPU, then crash, then continue on CPUs manually"-model. I shouldn't call it a model though - we simply haven't had time enough to prioritize this.

I hope to be able to make this automatic by 3.1 - Relion won't die with out-of-mem, it will just run the accelerated CPU-code instead. This might under-utilize the hardware though - nobody likes a user that needlessly hogs the GPU-resources. Unification of memory seems like a nice solution, but it's a massive amount of work that might end up slowing down the overall application, and will mess with the current memory flow. It's also a big, blind bet for what nvidia does next; we cannot justify large re-designs unless we are convinced that it will continue to be supported and efficient. Hence our reservations against it.

Note on tensor cores;

They are great for neural nets. Haven't heard that they would be good specifically for FFTs, although it makes sense they could do it. Source?