RomainFranceschini / mpi.cr

MPI bindings for the Crystal language
Other
6 stars 0 forks source link

calling world.process_at(0).receive(...) in a fiber blocks #1

Open xa72 opened 11 months ago

xa72 commented 11 months ago

First: thank you for this shard.

I'm rewriting an application in C++ to Crystal which uses MPI and does a calculation while listening at the same time for an MPI-message from another executable doing the same. So I thought to wrap the receiveing and the calculation in fibers (spawn do ... end) but as soon as I do that, the program halts when executing receive() an all other fibers are blocked too. The program hangs. This happens also when I compile my source with active thread-support for fibers (crystal build -Dpreview_mt ...).

Is there a way to do this right?

I could use receive_immediate with polling, but haven't found a good example how to use that in your example-directory (send_immediate has examples, but no receive_immediate). It seems to me, the receive_immediate is not part of the publicly intended calls of mpi.cr. (It would be a suboptimal solution for me anyway, because this adds a small delay because of the polling which sums up over time - a calculation can be finished after some 100 ms and the application has to run sometimes for hours doing a lot of them).

As I understand, the receive-call uses threads in the OpenMPI / MPICH implementation and therefore is not useable with fibres in crystal. But I'm not sure. In this case, I would have to use the internal Thread-class of crystal, which I have no experience with and which should not be used (all info bits from the forum).

Do you have any hints? If not, I will ask in the crystal forums, but I thought, I'd ask you first.

Alex

RomainFranceschini commented 11 months ago

Hi, thanks for your interest in this shard!

mpi.cr is a tiny wrapper around the MPI C library and exposes a low-level API. It does not play nice with Crystal concurrency model in its current form.

As Crystal Fibers are cooperative, they are expected to yield to give other fibers a chance to run, either explicitly or using IO calls.. which blocking MPI calls such as receive does not do. So yes, you should to use their non-blocking equivalent (immediate_ calls) to avoid blocking all fibers waiting for MPI.

The immediate.cr example does use the immediate_receive call. But since I didn't updated this shard for a while it might be broken? Let me know, I can try to come up with a small example with immediate calls & fibers.

I guess the ideal for this shard would be to avoid exposing blocking calls and abstract away non-blocking calls so they appear blocking, which is what Crystal IO basically does.