OPM / opm-common

Common components for OPM, in particular build system (cmake).
http://www.opm-project.org
GNU General Public License v3.0
34 stars 111 forks source link

The UnitSystem might cause Well2 neither copyable nor movable. #815

Closed GitPaean closed 5 years ago

GitPaean commented 5 years ago

It looks like the copy constructor is disabled by compiler. So the move constructor was called for the failure under the following circumstance.

Direct running runs into the following error,

*** Error in `OPM-master-test/opm/cmake-build-debug/opm-simulators/bin/flow': malloc(): memory corruption: 0x00007f90388f4d50 ***

The function involved is (Schedule.cpp 1881—1904)

    std::vector< Well2 > Schedule::getChildWells2(const std::string& group_name, size_t timeStep, GroupWellQueryMode query_mode) const {
        if (!hasGroup(group_name))
            throw std::invalid_argument("No such group: " + group_name);
        {
            const auto& group = getGroup( group_name );
            std::vector<Well2> wells;

            if (group.hasBeenDefined( timeStep )) {
                const GroupTree& group_tree = getGroupTree( timeStep );
                const auto& child_groups = group_tree.children( group_name );

                if (child_groups.size() && query_mode == GroupWellQueryMode::Recursive) {
                    for (const auto& child : child_groups) {
                        const auto& child_wells = getChildWells2( child, timeStep, query_mode );
                        wells.insert( wells.end() , child_wells.begin() , child_wells.end());
                    }
                } else {
                    for (const auto& well_name : group.getWells( timeStep ))
                        wells.push_back( this->getWell2( well_name, timeStep ));
                }
            }
            return wells;
        }
    }

It crashed at the line 1899 wells.push_back( this->getWell2( well_name, timeStep ));

Calling stack


__GI_raise 0x00007f229c877428
__GI_abort 0x00007f229c87902a
__libc_message 0x00007f229c8b97ea
malloc_printerr 0x00007f229c8c413e
_int_malloc 0x00007f229c8c413e
__GI___libc_malloc 0x00007f229c8c6184
operator new(unsigned long) 0x00007f229d2eee78
__gnu_cxx::new_allocator<Opm::Well2>::allocate new_allocator.h:104
std::allocator_traits<std::allocator<Opm::Well2> >::allocate alloc_traits.h:491
std::_Vector_base<Opm::Well2, std::allocator<Opm::Well2> >::_M_allocate stl_vector.h:170
std::vector<Opm::Well2, std::allocator<Opm::Well2> >::_M_emplace_back_aux<Opm::Well2 const&> vector.tcc:412
std::vector<Opm::Well2, std::allocator<Opm::Well2> >::push_back stl_vector.h:923
Opm::Schedule::getChildWells2 Schedule.cpp:1899
(anonymous namespace)::IGrp::staticContrib<boost::iterator_range<__gnu_cxx::__normal_iterator<int*, std::vector<int> > > > AggregateGroupData.cpp:180
Opm::RestartIO::Helpers::AggregateGroupData::<lambda(const Opm::Group&, std::size_t)>::operator() AggregateGroupData.cpp:515
(anonymous namespace)::groupLoop<Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(const Opm::Schedule&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char> >&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, std::size_t, const Opm::SummaryState&, const std::vector<int>&)::<lambda(const Opm::Group&, std::size_t)> > AggregateGroupData.cpp:88
Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData AggregateGroupData.cpp:516
Opm::RestartIO::(anonymous namespace)::writeGroup RestartIO.cpp:240
Opm::RestartIO::save RestartIO.cpp:476
Opm::EclipseIO::writeTimeStep EclipseIO.cpp:541
Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run eclwriter.hh:650
Ewoms::TaskletRunner::run_ tasklets.hh:330
Ewoms::TaskletRunner::startWorkerThread_ tasklets.hh:291
<unknown> 0x00007f229d319c80
start_thread 0x00007f229cc136ba
clone 0x00007f229c94941d ```
GitPaean commented 5 years ago

I am not sure why it does not always cause problem.

if you rewrite the related code in the following way

const Well2 w(this->getWell2( well_name, timeStep ));

this will also fail. It points to the UnitSystem.

We guess it is related to the member const char* const* unit_name_table; of UnitSystem, while we did not find a fix for it.

GitPaean commented 5 years ago

With 2019.04 release, the same error was reproduced, with the same function. So the cause can be different.

1710         std::vector< const Well* > Schedule::getChildWells(const std::string& group_name, size_t timeStep) const {
1711         if (!hasGroup(group_name))
1712             throw std::invalid_argument("No such group: " + group_name);
1713         {
1714             const auto& group = getGroup( group_name );
1715             std::vector<const Well*> wells;
1716 
1717             if (group.hasBeenDefined( timeStep )) {
1718                 const GroupTree& group_tree = getGroupTree( timeStep );
1719                 const auto& child_groups = group_tree.children( group_name );
1720 
1721                 if (!child_groups.size()) {
1722                     //for (const auto& well_name : group.getWells( timeStep )) {
1723                     const auto& ch_wells = group.getWells( timeStep );
1724                     for (auto it= ch_wells.begin(); it != ch_wells.end(); it++) {
1725                         wells.push_back( getWell( *it ));
1726                     }
1727                 }
1728             }
1729             return wells;
1730         }
1731     }

Line 1725, wells.push_back( getWell( *it )); crashed with same symptom.

backtrace

#0  0x00007ffff5b71428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5b7302a in __GI_abort () at abort.c:89
#2  0x00007ffff5bb37ea in __libc_message (do_abort=2, fmt=fmt@entry=0x7ffff5ccced8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff5bbe13e in malloc_printerr (ar_ptr=0x7fff4c000020, ptr=0x7fff30866580, str=0x7ffff5cc9d3f "malloc(): memory corruption", action=<optimized out>) at malloc.c:5006
#4  _int_malloc (av=av@entry=0x7fff4c000020, bytes=bytes@entry=64) at malloc.c:3474
#5  0x00007ffff5bc0184 in __GI___libc_malloc (bytes=64) at malloc.c:2913
#6  0x00007ffff63cbe78 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x0000000001587ff7 in __gnu_cxx::new_allocator<Opm::Group const*>::allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/5/ext/new_allocator.h:104
#8  std::allocator_traits<std::allocator<Opm::Group const*> >::allocate (__a=..., __n=<optimized out>) at /usr/include/c++/5/bits/alloc_traits.h:491
#9  std::_Vector_base<Opm::Group const*, std::allocator<Opm::Group const*> >::_M_allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:170
#10 std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::_M_emplace_back_aux<Opm::Well const*>(Opm::Well const*&&) (this=0x7fffcd584b20) at /usr/include/c++/5/bits/vector.tcc:412
#11 std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::emplace_back<Opm::Well const*>(Opm::Well const*&&) (this=this@entry=0x7fffcd584b20) at /usr/include/c++/5/bits/vector.tcc:101
#12 0x00000000015759cc in std::vector<Opm::Well const*, std::allocator<Opm::Well const*> >::push_back(Opm::Well const*&&) (__x=<unknown type in /home/kaib/OPM-PR-test2/debug/opm-simulators-build/bin/flow, CU 0x8a39dbc, DIE 0x8b74278>, this=0x7fffcd584b20) at /usr/include/c++/5/bits/stl_vector.h:932
#13 Opm::Schedule::getChildWells (this=this@entry=0x126cd910, group_name=..., timeStep=timeStep@entry=106) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/parser/eclipse/EclipseState/Schedule/Schedule.cpp:1725
#14 0x0000000001736de2 in (anonymous namespace)::IGrp::staticContrib<boost::iterator_range<__gnu_cxx::__normal_iterator<int*, std::vector<int> > > > (inteHead=std::vector of length 411, capacity 411 = {...}, iGrp=<synthetic pointer>, simStep=106, ngmaxz=11, nwgmax=40, group=..., sched=...) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:181
#15 Opm::RestartIO::Helpers::AggregateGroupData::<lambda(const Opm::Group&, std::size_t)>::operator() (groupID=0, group=..., __closure=<optimized out>) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:543
#16 (anonymous namespace)::groupLoop<Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(const Opm::Schedule&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char> >&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, const std::map<std::__cxx11::basic_string<char>, long unsigned int>&, bool, std::size_t, const Opm::SummaryState&, const std::vector<int>&)::<lambda(const Opm::Group&, std::size_t)> > (groupOp=<optimized out>, groups=<synthetic pointer>) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:89
#17 Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData (this=this@entry=0x7fffcd585390, sched=..., restart_group_keys=std::vector of length 21, capacity 21 = {...}, restart_field_keys=std::vector of length 21, capacity 21 = {...}, groupKeyToIndex=std::map with 21 elements = {...}, fieldKeyToIndex=std::map with 21 elements = {...}, ecl_compatible_rst=true, simStep=106, sumState=..., 
    inteHead=std::vector of length 411, capacity 411 = {...}) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/AggregateGroupData.cpp:544
#18 0x000000000163109b in Opm::RestartIO::(anonymous namespace)::writeGroup (ih=std::vector of length 411, capacity 411 = {...}, sumState=..., schedule=..., ecl_compatible_rst=true, sim_step=<optimized out>, rst_file=0x7fff3085cd60) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/RestartIO.cpp:324
#19 Opm::RestartIO::save (filename=..., report_step=<optimized out>, seconds_elapsed=281318400, value=..., es=..., grid=..., schedule=..., sumState=..., write_double=false) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/RestartIO.cpp:558
#20 0x0000000001626ef0 in Opm::EclipseIO::writeTimeStep (this=0x2045670, report_step=107, isSubstep=<optimized out>, secs_elapsed=281318400, value=..., single_summary_values=std::map with 5 elements = {...}, region_summary_values=std::map with 0 elements, block_summary_values=std::map with 0 elements, write_double=false) at /home/kaib/OPM-PR-test2/debug/opm-common/src/opm/output/eclipse/EclipseIO.cpp:462
#21 0x0000000000b60a40 in Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run (this=0x7665dec0) at /home/kaib/OPM-PR-test2/debug/opm-simulators/ebos/eclwriter.hh:472
#22 0x0000000000a87b91 in Ewoms::TaskletRunner::run_ (this=0x50ba9550) at /home/kaib/OPM-PR-test2/debug/ewoms/ewoms/parallel/tasklets.hh:330
#23 Ewoms::TaskletRunner::startWorkerThread_ (taskletRunner=0x50ba9550, workerThreadIndex=<optimized out>) at /home/kaib/OPM-PR-test2/debug/ewoms/ewoms/parallel/tasklets.hh:291
#24 0x00007ffff63f6c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x00007ffff7bc16ba in start_thread (arg=0x7fffcd586700) at pthread_create.c:333
GitPaean commented 5 years ago

The following statement is false.

It looks like the copy constructor is disabled by compiler.

it happened because I declared the move constructor to be =delete.

GitPaean commented 5 years ago

Valgrind with both the master branch and 2019.04 release shows an invalid write at the following code, at the line curGroups[static_cast<int>(it->first)] = it->second;.

void
Opm::RestartIO::Helpers::AggregateGroupData::
captureDeclaredGroupData(const Opm::Schedule&                 sched,
             const std::vector<std::string>&      restart_group_keys,
             const std::vector<std::string>&      restart_field_keys,
             const std::map<std::string, size_t>& groupKeyToIndex,
             const std::map<std::string, size_t>& fieldKeyToIndex,
             const std::size_t                    simStep,
             const Opm::SummaryState&             sumState,
             const std::vector<int>&              inteHead)
{
    const auto indexGroupMap = currentGroupMapIndexGroup(sched, simStep, inteHead);
    const auto nameIndexMap = currentGroupMapNameIndex(sched, simStep, inteHead);

    std::vector<const Opm::Group*> curGroups(ngmaxz(inteHead), nullptr);

    auto it = indexGroupMap.begin();
    while (it != indexGroupMap.end())
    {
        curGroups[static_cast<int>(it->first)] = it->second;
        it++;
    }

    groupLoop(curGroups, [&sched, simStep, &inteHead, this]
        (const Group& group, const std::size_t groupID) -> void
    {
        auto ig = this->iGroup_[groupID];

        IGrp::staticContrib(sched, group, this->nWGMax_, this->nGMaxz_,
                            simStep, ig, inteHead);
    });

    // Define Static Contributions to SGrp Array.
    groupLoop(curGroups,
              [this](const Group& /* group */, const std::size_t groupID) -> void
    {
        auto sw = this->sGroup_[groupID];
        SGrp::staticContrib(sw);
    });

With the master branch

AggregateGroupData.cpp
Invalid write of size 8
Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(Opm::Schedule const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, unsigned long, Opm::SummaryState const&, std::vector<int, std::allocator> const&)
writeGroup
Opm::RestartIO::save(Opm::EclIO::OutputStream::Restart&, int, double, Opm::RestartValue, Opm::EclipseState const&, Opm::EclipseGrid const&, Opm::Schedule const&, Opm::SummaryState const&, bool)
Opm::EclipseIO::writeTimeStep(Opm::SummaryState const&, int, bool, double, Opm::RestartValue, bool)
Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run()
run_
Ewoms::TaskletRunner::startWorkerThread_(Ewoms::TaskletRunner*, int)
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
start_thread
clone
Address 0x107eeb858 is 0 bytes after a block of size 88 alloc'd
operator new(unsigned long)
allocate
allocate
_M_allocate
_M_create_storage
_Vector_base
vector
Opm::RestartIO::Helpers::AggregateGroupData::captureDeclaredGroupData(Opm::Schedule const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::vector<std::__cxx11::basic_string, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, std::map<std::__cxx11::basic_string, unsigned long, std::less, std::allocator> const&, unsigned long, Opm::SummaryState const&, std::vector<int, std::allocator> const&)
writeGroup
Opm::RestartIO::save(Opm::EclIO::OutputStream::Restart&, int, double, Opm::RestartValue, Opm::EclipseState const&, Opm::EclipseGrid const&, Opm::Schedule const&, Opm::SummaryState const&, bool)
Opm::EclipseIO::writeTimeStep(Opm::SummaryState const&, int, bool, double, Opm::RestartValue, bool)
Ewoms::EclWriter<Ewoms::Properties::TTag::EclFlowProblem>::EclWriteTasklet::run()
run_
Ewoms::TaskletRunner::startWorkerThread_(Ewoms::TaskletRunner*, int)
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
start_thread
clone
bska commented 5 years ago

an invalid write

I'll look into it.

GitPaean commented 5 years ago

Testing output shows

Report step 107/350 at day 3256/10653, date = 01-Oct-2019
*** Error in `/home/kaib/OPM-test/debug/opm-simulators-build/bin/flow': malloc(): memory corruption: 0x00007f99888f4f70 ***
 ngmaxz(inteHead) 11
 it->first 0
 it->first 1
 it->first 2
 it->first 3
 it->first 4
 it->first 5
 it->first 6
 it->first 7
 it->first 8
 it->first 9
 it->first 10
 it->first 11
 it->first 12

Maybe it is something should be fixed from the deck side?

bska commented 5 years ago

Maybe it is something should be fixed from the deck side?

No, the deck is (probably) fine. This is a local problem in the output code.

GitPaean commented 5 years ago

No, the deck is (probably) fine. This is a local problem in the output code.

Sure, you are familiar with the problem. I am just reporting another symptom, just hoping it is helpful. I will leave the problem to you.

I changed the deck WELLDIMS from

 WELLDIMS                               
    150 73 10 40 

to

 WELLDIMS                               
    150 73 100 40 

The running passed the place where the report where it used to crash. At the same time, it does not print the output from the above comment anymore.

while it prints the following output and continue running. It prints the message for every following report steps afterwards.

Report step 107/350 at day 3256/10653, date = 01-Oct-2019
ERROR: Uncaught std::exception when running tasklet: Unable to Determine Report Step Sequence Number From Restart Filename "flow-test-0617/DECK.UNRST". Trying to continue.
bska commented 5 years ago
ERROR: Uncaught std::exception when running tasklet: Unable to Determine Report Step Sequence Number From Restart Filename "flow-test-0617/DECK.UNRST". Trying to continue.

I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT? If so, does it also have UNIFIN?

GitPaean commented 5 years ago

I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT? If so, does it also have UNIFIN?

They are both there.

bska commented 5 years ago

I take it you're (trying) to restart a previous simulation. Does your deck have UNIFOUT? If so, does it also have UNIFIN?

They are both there.

Okay, then I don't understand what's happening. I'll need to look at the structure of your output files. I'll come by your office.

bska commented 5 years ago

I don't understand what's happening.

I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS specification—using PR #829?

GitPaean commented 5 years ago

I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS specification—using PR #829?

I will do that. And report it back tomorrow.

GitPaean commented 5 years ago

Sorry for any confusion it might have caused to @joakim-hove . The original symptom is a little wild and gdb always stopped at the wells.push_back( this->getWell2( well_name, timeStep ));, and even pointed to UnitSystem with some small reformulation of the code.

GitPaean commented 5 years ago

I now think I understand the underlying problem. Would you be able to test your model—with the original WELLDIMS specification—using PR #829?

I confirm PR #829 fixes the running with the original setup. Closing the issue now.