Closed yaoyi92 closed 7 months ago
Hi @yaoyi92!
I am not sure if we already discussed this by email?
COSMA does not have any wrapper around MPI_Finalize
and in general, COSMA does not have a "initialize/finalize" logic.
What happens in COSMA is that the user has the following options:
The context is then reused in all multiplications and it mostly contains the memory pool (both CPU and GPU).
It is possible to control how much CPU and GPU memory you allow COSMA to use:
I assume you used the implicitly created global context (as in 2.). In that case, you should be able to call the destructor explicitly:
get_context_instance<T>()->~cosma_context();
This will release all CPU and GPU memory that was allocated.
Let me know if you have any other questions!
Is the problem you mentioned still present with the latest version: COSMA-v2.6.1?
Btw, the gpu devices should be set outside of COSMA (e.g. in cp2k) and COSMA will just inherit the devices that were previously set. This could cause the issue you are referring to.
Sorry for the late reply. Yes, the problem is still there. However, the message seems to show up after the job finished so it doesn't bother too much for now. We used the pxgemm_cosma wrapper you provided and we initialized our own GPU devices in our code (FHI-aims). It feels like the same situation with cp2k. Is there a solution for that?
Do you think it worth to try the destructor? Do you have a fortran wrapper for the destructor?
Sorry for the late reply. Yes, the problem is still there. However, the message seems to show up after the job finished so it doesn't bother too much for now. We used the pxgemm_cosma wrapper you provided and we initialized our own GPU devices in our code (FHI-aims). It feels like the same situation with cp2k. Is there a solution for that?
The CP2K problem was really not due to COSMA. The MPI implementation registers some of the GPU buffers used in MPI communications. The error is that you get a double finalization (in MPI and COSMA), but now the problem is fixed and we leave COSMA to finalize.
I realized I cannot reopen the issue #87
copy the question here. Sorry for the confusion.
=====
Hi @kabicm , I am able to redo the test again and the problem still exists with v2.5.0 and the master branch. Does COSMA use a wrapper over MPI_Finalize()? I notice a similar issue on Summit here with another code that is wrapping over MPI_Finalize LLNL/Caliper#392.
If that's the case, my question becomes: Is it possible to manually finalize COSMA?
A related question here: if I called COSMA, does it take the GPU memory after the gemm calls? Is it possible to control those GPU memories? I guess I am looking for something like initiating and finalizing a COSMA environment over a certain code region and free up the GPU memories when outside the code region.
Best wishes, Yi