Do not try to compute initial solution for inactive multi-segment wells split across processors

vkip commented 1 week ago

jenkins build this please

bska commented 1 week ago

For what it's worth, this PR allows me to run a field case as mpirun -np 14 which, without this PR, crashes very early in the initialisation process.

I'll nevertheless defer to those more familiar with this part of the code to review the PR as there may be aspects of the structure that I don't fully grasp.

GitPaean commented 1 week ago

Is there some more information regarding the symptom? Where does it crash exactly?

bska commented 1 week ago

Where does it crash exactly?

In current master we crash in WellState<>::initWellStateMSWell() when indexing into an empty perf_rates object when n_activeperf > 0 (n_activeperf = 22 for w=0 in one of my test runs).

That said, if we want to use the proposed guard, then we should at least amend it to perf_rates.size() != n_activeperf * np, since there is supposed to be np entries for each active connection/perforation.

vkip commented 1 week ago

That said, if we want to use the proposed guard, then we should at least amend it to perf_rates.size() != n_activeperf * np, since there is supposed to be np entries for each active connection/perforation.

Or just perf_data.size() != n_active_perf, as it is currently in the PR?

GitPaean commented 1 week ago

My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).

If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like ws.perf_data.size() == 0)

And also, let us output some DEBUG information or throw if ws.perf_data.size()) > 0, and ws.perf_data.size()) and n_activeperf are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.

vkip commented 1 week ago

My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).

I agree that the case with distributed active wells need to be handled by that development, hence the \todo message.

If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like ws.perf_data.size() == 0)

When allowing to split inactive wells (that are never open at any time during the simulation) across processes, ws.perf_data.size() is not equal to zero here. There are perforations, since these wells may need to output RFT data, but each process may not have all of them.

Checking for SHUT sounds dangerous, since I guess wells may open during a time step..?

And also, let us output some DEBUG information or throw if ws.perf_data.size()) > 0, and ws.perf_data.size()) and n_activeperf are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.

Since this is not an error situation I think we should avoid DEBUG messages and definitely throws.

vkip commented 1 week ago

I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?

GitPaean commented 1 week ago

I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?

Yes, that is sensible.

And we discussed a little bit. Since we decide some inactive wells can be distributed across processes, there should be a way/criteria to detect/decide which wells can be split. For those wells, since we can not do much (like opening them), let us do minimal things with them. For example, if possible, not initialize unneeded wellstate information (you are the one knows the best regarding this issue).

For the function initWellStateMSWell(), you can safely continue at the beginning of the for loop for those wells, and for init() and base_init(), we can also do less possibly but I am not familiar related to the RFT usage.

Please let us know what you think of it.

vkip commented 1 week ago

jenkins build this please

GitPaean commented 1 week ago

@bska , can you test whether the current version fix the running of your case? I am happy with the current approach that has a more specific design to tackle the problem. You can review/merge as you will.

lisajulia commented 1 week ago

can you test whether the current version fix the running of your case?

I've just completed a test of field case I mentioned before. I can confirm that the case continues to run in parallel (mpirun -np 14) with this edition of the PR. In the current master sources the case does not run in parallel, but it does run in sequential mode.

I am happy with the current approach that has a more specific design to tackle the problem.

It looks good to me too. At some point we may consider moving the Schedule::getInactiveWellNamesAtEnd() call to the WellState constructor, however. We call WellState<>::init() at least once for each report step and I don't really expect getInactiveWellNamesAtEnd() to change although I may be missing something.

In any case, this fixes a real problem on a real case so I'll merge into master.

@bska can you rerun the test field case you were running with mpirun -np 14 once more with the current master and/or send me the file so I can also check this on my side? I've been working on running MSWells in parallel, I've split my work into two PRs (assembly #5680 and solving #5746) and I would like to test with that file as well.

Thanks!

bska commented 1 week ago

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

lisajulia commented 1 week ago

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

Nothing in particular, just check if the case runs through as expected. Thanks!

bska commented 1 week ago

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

Nothing in particular, just check if the case runs through as expected

Cool. I'll just rebuild everything first to make sure I have a consistent set of binaries given the CMake changes that were just merged.

bska commented 1 week ago

Nothing in particular, just check if the case runs through as expected

@lisajulia : The model does indeed still run as mpirun -np 14.

GitPaean commented 1 week ago

I think the concern only applies when we actually distribute the MS wells across processes.

lisajulia commented 1 week ago

Yes :) @bska : can you also try with this PR? https://github.com/OPM/opm-simulators/pull/5746

bska commented 1 week ago

can you also try with PR #5746?

I got slightly different timestepping behaviour between master and that PR, but not different enough that it's possible to say that one run is "better" than the other. Final TCPU is currently slightly higher with #5746 than in master as of #5756.

On a sidenote, if AllowDistributedWells is supposed to work as of #5746, then there's still something missing as I get the diagnostic below when setting the value to true.

Error: Option --allow-distributed-wells=true is only allowed if model
only has only standard wells. You need to provide option 
 with --enable-multisegement-wells=false to treat existing 
multisegment wells as standard wells.

Error: [${ROOT}/opm-simulators/opm/simulators/flow/FlowGenericVanguard.cpp:332] All wells need to be standard wells!

lisajulia commented 1 week ago

Ok thanks, I will take this setting into account for my PR #5746 !

akva2 commented 1 week ago

do address the typo in the message as well (--enable-multisegment-wells=false)

lisajulia commented 1 week ago

do address the typo in the message as well (--enable-multisegment-wells=false)

https://github.com/OPM/opm-simulators/pull/5746/commits/6bdb80126fd85ccd0d90a78f5cb85ece99fb23fd#diff-cdbb36d3d28bb6896b6aa7d316bc42496e4feb0bca83f210919e4826dc7f275dR327

OPM / opm-simulators

Do not try to compute initial solution for inactive multi-segment wells split across processors #5751