NVIDIA / AMGX

Distributed multigrid linear solver library on GPU
487 stars 142 forks source link

Copy MPI_Comm passed to AMGX_resources_create() #123

Open pkarnakov opened 3 years ago

pkarnakov commented 3 years ago

AMGX_resources_create() takes a void* comm and calls the constructor of Resources, which copies the pointer to a member variable

MPI_Comm *m_mpi_comm;

https://github.com/NVIDIA/AMGX/blob/919fb92a2c02ea3d8d3d0f0660e3b507d84705bc/base/src/resources.cu#L153

This prevents the caller from passing a temporary MPI_Comm to AMGX_resources_create(). An alternative is to store a copy of the communicator in:

MPI_Comm m_mpi_comm;

Is there a reason to store the pointer instead?

mattmartineau commented 3 years ago

I took a look to see if I could find why this was designed this way in the first place, and it seems to be a purposeful API design decision.

Throughout the code we allow MPI to be disabled if AMGX_WITH_MPI is not defined. If you were to pass the MPI communicator into the API by value then it would not be possible to accept the typeless void* and then cast to MPI_Comm inside the guarded region if MPI is enabled.

This same functionality could be alternatively achieved by having distinct APIs for AMGX_resources_create when MPI is enabled and disabled. However, this conditionality would need to be followed through all the way to the resource constructor and makes the API more confusing. I think requiring users to pass a pointer to the MPI comm handle is not onerous enough to motivate an API change.

pkarnakov commented 3 years ago

AMGX_resources_create() would still take a pointer, but the implementation would copy the communicator

diff --git a/base/include/resources.h b/base/include/resources.h
index 1fe13c6..2ac7d6c 100644
--- a/base/include/resources.h
+++ b/base/include/resources.h
@@ -61,7 +61,7 @@ class Resources
         int m_high_priority_stream;
         int m_serialize_threads;
 #ifdef AMGX_WITH_MPI
-        MPI_Comm *m_mpi_comm;
+        MPI_Comm m_mpi_comm;
 #endif
     public:
         Resources(AMG_Configuration *cfg, void *comm, int device_num, const int *devices);
@@ -71,7 +71,7 @@ class Resources
         ThreadManager *get_tmng() { return m_tmng; }
         AMG_Config *getResourcesConfig() { return m_cfg; }
 #ifdef AMGX_WITH_MPI
-        MPI_Comm *getMpiComm() { return m_mpi_comm; }
+        MPI_Comm *getMpiComm() { return &m_mpi_comm; }
 #endif
         int getDevice(int device_num) const { return m_devices[device_num]; }
         bool getHandleErrors() const { return m_handle_errors; }
diff --git a/base/src/resources.cu b/base/src/resources.cu
index 52a5a5f..a91af8f 100644
--- a/base/src/resources.cu
+++ b/base/src/resources.cu
@@ -150,7 +150,7 @@ Resources::Resources(AMG_Configuration *cfg, void *comm, int device_num, const i
     memory::setDeviceMemoryPoolFlag(true);
     // create communicator
 #ifdef AMGX_WITH_MPI
-    m_mpi_comm = (MPI_Comm *) comm;
+    m_mpi_comm = *(MPI_Comm *) comm;
 #endif
 }
mattmartineau commented 3 years ago

If we are happy to store a temporary copy of the handle when MPI is enabled then this would be an ideal solution. In my mind though changing whether we store a pointer to or value of the handle constitutes an API change (internal rather then external).

The resource creation (AMGX_resources_create) is only expected to be called once in an application but it is possible that different solvers within an application (sharing that resource handle) require different communicators (I believe #109 is an example where this might be relevant). My concern is that if we store the temporary we will lose the ability to change the communicator without re-creating the resources, which is prohibitively expensive.

Now I cannot argue that this is actually working as I haven't tried it out personally and I have no reason to believe this was the original design intention (rather than the reason I speculated before). However, we are trying to make sure we can support applications with multiple distinct solves and even multi-physics applications, where you could expect this sort of requirement to be more common and so abandoning that flexibility seems like heading in the wrong direction. We could maintain different versions of AMGX_resources_create to handle this or some extension of the API but I don't think there is enough benefit.

@marsaev was involved in the earlier stages of development so might have some different thoughts on this.

marsaev commented 3 years ago

Good design (and i think this is what typically done in distributed codes) would be duplicate input communicator with MPI_Comm_dup() specifically for the case of comm being released by user's host code. Some minor additional resources (time spent on dup, storage) would be needed for that. Way of storage doesn't matter that much in that case, but if we revisit this piece of code - i would replace it with shared_ptr in case we would like to create copy of a resource.

marsaev commented 3 years ago

Tracking internally: AMGX-47