Closed vkip closed 1 week ago
For what it's worth, this PR allows me to run a field case as mpirun -np 14
which, without this PR, crashes very early in the initialisation process.
I'll nevertheless defer to those more familiar with this part of the code to review the PR as there may be aspects of the structure that I don't fully grasp.
Is there some more information regarding the symptom? Where does it crash exactly?
Where does it crash exactly?
In current master we crash in WellState<>::initWellStateMSWell()
when indexing into an empty perf_rates
object when n_activeperf > 0
(n_activeperf = 22
for w=0
in one of my test runs).
That said, if we want to use the proposed guard, then we should at least amend it to perf_rates.size() != n_activeperf * np
, since there is supposed to be np
entries for each active connection/perforation.
That said, if we want to use the proposed guard, then we should at least amend it to
perf_rates.size() != n_activeperf * np
, since there is supposed to benp
entries for each active connection/perforation.
Or just perf_data.size() != n_active_perf
, as it is currently in the PR?
My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).
If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like ws.perf_data.size() == 0
)
And also, let us output some DEBUG information or throw if ws.perf_data.size()) > 0
, and ws.perf_data.size())
and n_activeperf
are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.
My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).
I agree that the case with distributed active wells need to be handled by that development, hence the \todo message.
If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like
ws.perf_data.size() == 0
)
When allowing to split inactive wells (that are never open at any time during the simulation) across processes, ws.perf_data.size()
is not equal to zero here. There are perforations, since these wells may need to output RFT data, but each process may not have all of them.
Checking for SHUT sounds dangerous, since I guess wells may open during a time step..?
And also, let us output some DEBUG information or throw if
ws.perf_data.size()) > 0
, andws.perf_data.size())
andn_activeperf
are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.
Since this is not an error situation I think we should avoid DEBUG messages and definitely throws.
I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?
I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?
Yes, that is sensible.
And we discussed a little bit. Since we decide some inactive wells can be distributed across processes, there should be a way/criteria to detect/decide which wells can be split. For those wells, since we can not do much (like opening them), let us do minimal things with them. For example, if possible, not initialize unneeded wellstate information (you are the one knows the best regarding this issue).
For the function initWellStateMSWell()
, you can safely continue
at the beginning of the for loop for those wells, and for init()
and base_init()
, we can also do less possibly but I am not familiar related to the RFT usage.
Please let us know what you think of it.
jenkins build this please
@bska , can you test whether the current version fix the running of your case? I am happy with the current approach that has a more specific design to tackle the problem. You can review/merge as you will.
can you test whether the current version fix the running of your case?
I've just completed a test of field case I mentioned before. I can confirm that the case continues to run in parallel (
mpirun -np 14
) with this edition of the PR. In the current master sources the case does not run in parallel, but it does run in sequential mode.I am happy with the current approach that has a more specific design to tackle the problem.
It looks good to me too. At some point we may consider moving the
Schedule::getInactiveWellNamesAtEnd()
call to theWellState
constructor, however. We callWellState<>::init()
at least once for each report step and I don't really expectgetInactiveWellNamesAtEnd()
to change although I may be missing something.In any case, this fixes a real problem on a real case so I'll merge into master.
@bska can you rerun the test field case you were running with mpirun -np 14
once more with the current master and/or send me the file so I can also check this on my side?
I've been working on running MSWells in parallel, I've split my work into two PRs (assembly #5680 and solving #5746) and I would like to test with that file as well.
Thanks!
can you rerun the test field case you were running with
mpirun -np 14
once more with the current master
Sure. Is there anything in particular you'd like me to look out for?
can you rerun the test field case you were running with
mpirun -np 14
once more with the current masterSure. Is there anything in particular you'd like me to look out for?
Nothing in particular, just check if the case runs through as expected. Thanks!
can you rerun the test field case you were running with
mpirun -np 14
once more with the current masterSure. Is there anything in particular you'd like me to look out for?
Nothing in particular, just check if the case runs through as expected
Cool. I'll just rebuild everything first to make sure I have a consistent set of binaries given the CMake changes that were just merged.
Nothing in particular, just check if the case runs through as expected
@lisajulia : The model does indeed still run as mpirun -np 14
.
I think the concern only applies when we actually distribute the MS wells across processes.
Yes :) @bska : can you also try with this PR? https://github.com/OPM/opm-simulators/pull/5746
can you also try with PR #5746?
I got slightly different timestepping behaviour between master and that PR, but not different enough that it's possible to say that one run is "better" than the other. Final TCPU is currently slightly higher with #5746 than in master as of #5756.
On a sidenote, if AllowDistributedWells
is supposed to work as of #5746, then there's still something missing as I get the diagnostic below when setting the value to true
.
Error: Option --allow-distributed-wells=true is only allowed if model only has only standard wells. You need to provide option with --enable-multisegement-wells=false to treat existing multisegment wells as standard wells. Error: [${ROOT}/opm-simulators/opm/simulators/flow/FlowGenericVanguard.cpp:332] All wells need to be standard wells!
Ok thanks, I will take this setting into account for my PR #5746 !
do address the typo in the message as well (--enable-multisegment-wells=false)
do address the typo in the message as well (--enable-multisegment-wells=false)
jenkins build this please