CliMA / ClimaAtmos.jl

ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
Apache License 2.0
85 stars 18 forks source link

MPI restart tests occasionally hang #3381

Open Sbozzolo opened 1 month ago

Sbozzolo commented 1 month ago

The tests for restarts are a little different than most other tests: they run multiple simulations in the same MPI context. For reasons that I don't understand, this leads to the test hanging something. I worry that this might be symptom of a deeper issue on what we do with our MPI communicator, but I could not reliably reproduce the issue and identify the problem.

charleskawczynski commented 6 days ago

I'm curious if this is related to https://github.com/CliMA/ClimaCore.jl/issues/1589. Unfortunately, I don't think we have a reproducer test in the ClimaCore test suite, but a reproducer for such an example would be very helpful.