eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
BSD 3-Clause "New" or "Revised" License
196 stars 27 forks source link

COSMA cublas crash after job finished #100

Closed yaoyi92 closed 7 months ago

yaoyi92 commented 3 years ago

I realized I cannot reopen the issue #87

copy the question here. Sorry for the confusion.

=====

Hi @kabicm , I am able to redo the test again and the problem still exists with v2.5.0 and the master branch. Does COSMA use a wrapper over MPI_Finalize()? I notice a similar issue on Summit here with another code that is wrapping over MPI_Finalize LLNL/Caliper#392.

If that's the case, my question becomes: Is it possible to manually finalize COSMA?

A related question here: if I called COSMA, does it take the GPU memory after the gemm calls? Is it possible to control those GPU memories? I guess I am looking for something like initiating and finalizing a COSMA environment over a certain code region and free up the GPU memories when outside the code region.

Best wishes, Yi

kabicm commented 2 years ago

Hi @yaoyi92!

I am not sure if we already discussed this by email?

COSMA does not have any wrapper around MPI_Finalize and in general, COSMA does not have a "initialize/finalize" logic.

What happens in COSMA is that the user has the following options:

  1. create a context explicitly and pass this context to multiply function. In this case, it is destroyed (something like finalize) when it goes out of scope.
  2. let the global context be created implicitly during the first multiplication. In this case, it is destroyed when the main goes out of scope.

The context is then reused in all multiplications and it mostly contains the memory pool (both CPU and GPU).

It is possible to control how much CPU and GPU memory you allow COSMA to use:

I assume you used the implicitly created global context (as in 2.). In that case, you should be able to call the destructor explicitly: get_context_instance<T>()->~cosma_context();

This will release all CPU and GPU memory that was allocated.

Let me know if you have any other questions!

kabicm commented 2 years ago

Is the problem you mentioned still present with the latest version: COSMA-v2.6.1?

kabicm commented 2 years ago

Btw, the gpu devices should be set outside of COSMA (e.g. in cp2k) and COSMA will just inherit the devices that were previously set. This could cause the issue you are referring to.

yaoyi92 commented 2 years ago

Sorry for the late reply. Yes, the problem is still there. However, the message seems to show up after the job finished so it doesn't bother too much for now. We used the pxgemm_cosma wrapper you provided and we initialized our own GPU devices in our code (FHI-aims). It feels like the same situation with cp2k. Is there a solution for that?

Do you think it worth to try the destructor? Do you have a fortran wrapper for the destructor?

alazzaro commented 2 years ago

Sorry for the late reply. Yes, the problem is still there. However, the message seems to show up after the job finished so it doesn't bother too much for now. We used the pxgemm_cosma wrapper you provided and we initialized our own GPU devices in our code (FHI-aims). It feels like the same situation with cp2k. Is there a solution for that?

The CP2K problem was really not due to COSMA. The MPI implementation registers some of the GPU buffers used in MPI communications. The error is that you get a double finalization (in MPI and COSMA), but now the problem is fixed and we leave COSMA to finalize.