Adds distributed row gatherer

ginkgo-project / ginkgo

Numerical linear algebra software package

https://ginkgo-project.github.io/

BSD 3-Clause "New" or "Revised" License

384 stars 86 forks source link

Adds distributed row gatherer #1589

Open MarcelKoch opened 2 months ago

MarcelKoch commented 2 months ago

This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.

Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional workspace parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.

This is the second part of splitting up #1546.

It also introduces some intermediate changes, which could be extracted out beforehand:

~~a type-erased DenseCache~~
~~makingdetail::run easier to use~~ now part of #1544

PR Stack:

[ ] #1545
[ ] #1543
[ ] #1579
[ ] #1544
[ ] #1588
[ ] #1589

MarcelKoch commented 2 months ago

One issue that I have is the constructor. It takes a collective_communicator and an index_map. The index_map already defines the communication pattern, so the collective_communicator has to match that. One option might be to have a virtual function like

std::unique_ptr<collective_communicator> create_with_same_type(communicator, index_map);

If I can't come up with anything better, I guess I will use that.

pratikvn commented 2 months ago

Do we need to have the std::future setup for the release ? Can we remove that for now and just use a normal synchronous approach ? I think that is a significant change that maybe needs more thought and probably a separate PR.

MarcelKoch commented 2 months ago

@pratikvn without it, we can't have any non-blocking communication. That would be quite a downgrade from what is currently there.

pratikvn commented 2 months ago

You should still be able to return a mpi::request to accomplish that ?

MarcelKoch commented 2 months ago

With mpi::request, it's only possible to have exactly one request at any given time. So the following will lead to a deadlock:

auto req1 = row_gatherer->apply_async(...);
auot req2 = row_gatherer->apply_async(...); // <- dead locks

The reason is that row gatherer stores the recv buffer internally, so a second call needs to wait until the first one is finished.

MarcelKoch commented 2 months ago

But I could use mpi::request and then just throw is a second apply is called. But tbh I think an std::future is more flexible. The mpi::request will not suffice for any more complex system, so we might as well already use std::future, since it has very few restrictions.

pratikvn commented 2 months ago

Atleast currently, I dont think there are use-cases of one row_gatherer object calling apply multiple times ? Alternatively, we could also pass in a workspace from the caller (matrix class here), which would allow for as many row_gatherer->apply() calls, which should also not have the restriction of the serialized async apply calls ?

MarcelKoch commented 2 months ago

You're right. I will go with the workspace version, since that is similar to what we already have e.g. for the scalar products.

MarcelKoch commented 2 months ago

@pratikvn I'm using now mpi requests.

MarcelKoch commented 1 month ago

Since this PR has a lot of conflicts with #1403, I will postpone this PR until after the release.