ECP-VeloC / VELOC

Very-Low Overhead Checkpointing System
http://veloc.rtfd.io
MIT License
52 stars 21 forks source link

MPI_Comm_split with uninitialized key value? #46

Closed adammoody closed 8 months ago

adammoody commented 8 months ago

I'm working with @kosinovsky to debug an XOR rebuild problem. While looking through the code, this line caught my eye:

https://github.com/ECP-VeloC/VELOC/blob/a5a9b8ae64f0a099001de8af79e08f3d50f5a83b/src/lib/client.cpp#L62

I don't think rank has been initialized at this point, which means it could have an arbitrary value. That could then lead to a potential random reordering of rank values in the backends communicator as compared to the parent communicator.

A potential fix would be to replace rank with 0 or otherwise move the MPI_Comm_rank(comm, &rank) higher up in the function.

bnicolae commented 8 months ago

Correct, I've fixed it