Closed GitPaean closed 5 years ago
I am not sure why it does not always cause problem.
if you rewrite the related code in the following way
const Well2 w(this->getWell2( well_name, timeStep ));
this will also fail. It points to the UnitSystem.
We guess it is related to the member const char* const* unit_name_table;
of UnitSystem
, while we did not find a fix for it.
With 2019.04 release, the same error was reproduced, with the same function. So the cause can be different.
1710 std::vector< const Well* > Schedule::getChildWells(const std::string& group_name, size_t timeStep) const {
1711 if (!hasGroup(group_name))
1712 throw std::invalid_argument("No such group: " + group_name);
1713 {
1714 const auto& group = getGroup( group_name );
1715 std::vector<const Well*> wells;
1716
1717 if (group.hasBeenDefined( timeStep )) {
1718 const GroupTree& group_tree = getGroupTree( timeStep );
1719 const auto& child_groups = group_tree.children( group_name );
1720
1721 if (!child_groups.size()) {
1722 //for (const auto& well_name : group.getWells( timeStep )) {
1723 const auto& ch_wells = group.getWells( timeStep );
1724 for (auto it= ch_wells.begin(); it != ch_wells.end(); it++) {
1725 wells.push_back( getWell( *it ));
1726 }
1727 }
1728 }
1729 return wells;
1730 }
1731 }
Line 1725, wells.push_back( getWell( *it ));
crashed with same symptom.
backtrace
#0 0x00007ffff5b71428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff5b7302a in __GI_abort () at abort.c:89
#2 0x00007ffff5bb37ea in __libc_message (do_abort=2, fmt=fmt@entry=0x7ffff5ccced8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff5bbe13e in malloc_printerr (ar_ptr=0x7fff4c000020, ptr=0x7fff30866580, str=0x7ffff5cc9d3f "malloc(): memory corruption", action=<optimized out>) at malloc.c:5006
#4 _int_malloc (av=av@entry=0x7fff4c000020, bytes=bytes@entry=64) at malloc.c:3474
#5 0x00007ffff5bc0184 in __GI___libc_malloc (bytes=64) at malloc.c:2913
#6 0x00007ffff63cbe78 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000000001587ff7 in __gnu_cxx::new_allocator<Opm::Group const*>::allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/5/ext/new_allocator.h:104
#8 std::allocator_traits<std::allocator<Opm::Group const*> >::allocate (__a=..., __n=<optimized out>) at /usr/include/c++/5/bits/alloc_traits.h:491
#9 std::_Vector_base<Opm::Group const*, std::allocator<Opm::Group const*> >::_M_allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:170
#10 std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::_M_emplace_back_aux<Opm::Well const*>(Opm::Well const*&&) (this=0x7fffcd584b20) at /usr/include/c++/5/bits/vector.tcc:412
#11 std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::emplace_back<Opm::Well const*>(Opm::Well const*&&) (this=this@entry=0x7fffcd584b20) at /usr/include/c++/5/bits/vector.tcc:101
#12 0x00000000015759cc in std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::push_back(Opm::Well const*&&) (__x=<unknown type in /home/kaib/OPM-PR-test2/debug/opm-simulators-build/bin/flow, CU 0x8a39dbc, DIE 0x8b74278>, this=0x7fffcd584b20) at /usr/include/c++/5/bits/stl_vector.h:932
#13 Opm::Schedule::getChildWells (this=this@entry=0x126cd910, group_name=..., timeStep=timeStep@entry=106) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/parser/eclipse/EclipseState/Schedule/Schedule.cpp:1725
#14 0x0000000001736de2 in (anonymous namespace)::IGrp::staticContrib<boost::iterator_range<__gnu_cxx::__normal_iterator<int*, std::vector<int> > > > (inteHead=std::vector of length 411, capacity 411 = {...}, iGrp=<synthetic pointer>, simStep=106, ngmaxz=11, nwgmax=40, group=..., sched=...) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:181
#15 Opm::RestartIO::Helpers::AggregateGroupData::<lambda(const Opm::Group&, std::size_t)>::operator() (groupID=0, group=..., __closure=<optimized out>) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:543
#16 (anonymous namespace)::groupLoop<Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(const Opm::Schedule&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char> >&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, bool, std::size_t, const Opm::SummaryState&, const std::vector<int>&)::<lambda(const Opm::Group&, std::size_t)> > (groupOp=<optimized out>, groups=<synthetic pointer>) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:89
#17 Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData (this=this@entry=0x7fffcd585390, sched=..., restart_group_keys=std::vector of length 21, capacity 21 = {...}, restart_field_keys=std::vector of length 21, capacity 21 = {...}, groupKeyToIndex=std::map with 21 elements = {...}, fieldKeyToIndex=std::map with 21 elements = {...}, ecl_compatible_rst=true, simStep=106, sumState=...,
inteHead=std::vector of length 411, capacity 411 = {...}) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:544
#18 0x000000000163109b in Opm::RestartIO::(anonymous namespace)::writeGroup (ih=std::vector of length 411, capacity 411 = {...}, sumState=..., schedule=..., ecl_compatible_rst=true, sim_step=<optimized out>, rst_file=0x7fff3085cd60) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/RestartIO.cpp:324
#19 Opm::RestartIO::save (filename=..., report_step=<optimized out>, seconds_elapsed=281318400, value=..., es=..., grid=..., schedule=..., sumState=..., write_double=false) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/RestartIO.cpp:558
#20 0x0000000001626ef0 in Opm::EclipseIO::writeTimeStep (this=0x2045670, report_step=107, isSubstep=<optimized out>, secs_elapsed=281318400, value=..., single_summary_values=std::map with 5 elements = {...}, region_summary_values=std::map with 0 elements, block_summary_values=std::map with 0 elements, write_double=false) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/EclipseIO.cpp:462
#21 0x0000000000b60a40 in Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run (this=0x7665dec0) at /home/kaib/OPM-PR-test2/debug/opm-simulators/ebos/eclwriter.hh:472
#22 0x0000000000a87b91 in Ewoms::TaskletRunner::run_ (this=0x50ba9550) at /home/kaib/OPM-PR-test2/debug/ewoms/ewoms/parallel/tasklets.hh:330
#23 Ewoms::TaskletRunner::startWorkerThread_ (taskletRunner=0x50ba9550, workerThreadIndex=<optimized out>) at /home/kaib/OPM-PR-test2/debug/ewoms/ewoms/parallel/tasklets.hh:291
#24 0x00007ffff63f6c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x00007ffff7bc16ba in start_thread (arg=0x7fffcd586700) at pthread_create.c:333
The following statement is false.
It looks like the copy constructor is disabled by compiler.
it happened because I declared the move constructor to be =delete
.
Valgrind with both the master branch and 2019.04 release shows an invalid write at the following code, at the line curGroups[static_cast<int>(it->first)] = it->second;
.
void
Opm::RestartIO::Helpers::AggregateGroupData::
captureDeclaredGroupData(const Opm::Schedule& sched,
const std::vector<std::string>& restart_group_keys,
const std::vector<std::string>& restart_field_keys,
const std::map<std::string, size_t>& groupKeyToIndex,
const std::map<std::string, size_t>& fieldKeyToIndex,
const std::size_t simStep,
const Opm::SummaryState& sumState,
const std::vector<int>& inteHead)
{
const auto indexGroupMap = currentGroupMapIndexGroup(sched, simStep, inteHead);
const auto nameIndexMap = currentGroupMapNameIndex(sched, simStep, inteHead);
std::vector<const Opm::Group*> curGroups(ngmaxz(inteHead), nullptr);
auto it = indexGroupMap.begin();
while (it != indexGroupMap.end())
{
curGroups[static_cast<int>(it->first)] = it->second;
it++;
}
groupLoop(curGroups, [&sched, simStep, &inteHead, this]
(const Group& group, const std::size_t groupID) -> void
{
auto ig = this->iGroup_[groupID];
IGrp::staticContrib(sched, group, this->nWGMax_, this->nGMaxz_,
simStep, ig, inteHead);
});
// Define Static Contributions to SGrp Array.
groupLoop(curGroups,
[this](const Group& /* group */, const std::size_t groupID) -> void
{
auto sw = this->sGroup_[groupID];
SGrp::staticContrib(sw);
});
With the master branch
AggregateGroupData.cpp
Invalid write of size 8
Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(Opm::Schedule const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, unsigned long, Opm::SummaryState const&, std::vector<int, std::allocator> const&)
writeGroup
Opm::RestartIO::save(Opm::EclIO::OutputStream::Restart&, int, double, Opm::RestartValue, Opm::EclipseState const&, Opm::EclipseGrid const&, Opm::Schedule const&, Opm::SummaryState const&, bool)
Opm::EclipseIO::writeTimeStep(Opm::SummaryState const&, int, bool, double, Opm::RestartValue, bool)
Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run()
run_
Ewoms::TaskletRunner::startWorkerThread_(Ewoms::TaskletRunner*, int)
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
start_thread
clone
Address 0x107eeb858 is 0 bytes after a block of size 88 alloc'd
operator new(unsigned long)
allocate
allocate
_M_allocate
_M_create_storage
_Vector_base
vector
Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(Opm::Schedule const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, unsigned long, Opm::SummaryState const&, std::vector<int, std::allocator> const&)
writeGroup
Opm::RestartIO::save(Opm::EclIO::OutputStream::Restart&, int, double, Opm::RestartValue, Opm::EclipseState const&, Opm::EclipseGrid const&, Opm::Schedule const&, Opm::SummaryState const&, bool)
Opm::EclipseIO::writeTimeStep(Opm::SummaryState const&, int, bool, double, Opm::RestartValue, bool)
Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run()
run_
Ewoms::TaskletRunner::startWorkerThread_(Ewoms::TaskletRunner*, int)
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
start_thread
clone
an invalid write
I'll look into it.
Testing output shows
Report step 107/350 at day 3256/10653, date = 01-Oct-2019
*** Error in `/home/kaib/OPM-test/debug/opm-simulators-build/bin/flow': malloc(): memory corruption: 0x00007f99888f4f70 ***
ngmaxz(inteHead) 11
it->first 0
it->first 1
it->first 2
it->first 3
it->first 4
it->first 5
it->first 6
it->first 7
it->first 8
it->first 9
it->first 10
it->first 11
it->first 12
Maybe it is something should be fixed from the deck side?
Maybe it is something should be fixed from the deck side?
No, the deck is (probably) fine. This is a local problem in the output code.
No, the deck is (probably) fine. This is a local problem in the output code.
Sure, you are familiar with the problem. I am just reporting another symptom, just hoping it is helpful. I will leave the problem to you.
I changed the deck WELLDIMS from
WELLDIMS
150 73 10 40
to
WELLDIMS
150 73 100 40
The running passed the place where the report where it used to crash. At the same time, it does not print the output from the above comment anymore.
while it prints the following output and continue running. It prints the message for every following report steps afterwards.
Report step 107/350 at day 3256/10653, date = 01-Oct-2019
ERROR: Uncaught std::exception when running tasklet: Unable to Determine Report Step Sequence Number From Restart Filename "flow-test-0617/DECK.UNRST". Trying to continue.
ERROR: Uncaught std::exception when running tasklet: Unable to Determine Report Step Sequence Number From Restart Filename "flow-test-0617/DECK.UNRST". Trying to continue.
I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT
? If so, does it also have UNIFIN
?
I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT? If so, does it also have UNIFIN?
They are both there.
I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT? If so, does it also have UNIFIN?
They are both there.
Okay, then I don't understand what's happening. I'll need to look at the structure of your output files. I'll come by your office.
I don't understand what's happening.
I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS
specification—using PR #829?
I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS specification—using PR #829?
I will do that. And report it back tomorrow.
Sorry for any confusion it might have caused to @joakim-hove . The original symptom is a little wild and gdb always stopped at the wells.push_back( this->getWell2( well_name, timeStep ));
, and even pointed to UnitSystem
with some small reformulation of the code.
I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS specification—using PR #829?
I confirm PR #829 fixes the running with the original setup. Closing the issue now.
It looks like the copy constructor is disabled by compiler. So the move constructor was called for the failure under the following circumstance.
Direct running runs into the following error,
The function involved is (
Schedule.cpp
1881—1904)It crashed at the line 1899
wells.push_back( this->getWell2( well_name, timeStep ));
Calling stack