OPM / opm-simulators

OPM Flow and experimental simulators, including components such as well models etc.
http://www.opm-project.org
GNU General Public License v3.0
122 stars 121 forks source link

Error flow on Rocky 8 #5520

Open bludvigsen opened 2 months ago

bludvigsen commented 2 months ago

Hi, I am getting the following error, any ideas? I am using Rocky 8 Linux...

(base) [bjolud@hpcopm01 IVAR_AASEN]$ uname -a
Linux hpcopm01 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Tue May 16 11:38:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
================ Starting main simulation loop ===============

Report step  0/819 at day 0/6947, date = 24-Dec-2016
Using Newton nonlinear solver.
Restart file written for report step   0/819, date = 24-Dec-2016 00:00:00

Starting time step 0, stepsize 0.0416667 days, at day 0/0.0416667, date = 24-Dec-2016
/opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1349: std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::element_type& std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::operator*() const [with _Tp = Dune::Communication<int>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool <anonymous> = false; bool <anonymous> = false; element_type = Dune::Communication<int>]: Assertion '_M_get() != nullptr' failed.
Aborted (core dumped)
blattms commented 2 months ago

This looks like a serious bug (dereferencing a shared_ptr<Dune::Communication> that does contain a nullptr). Can you give a little more detail:

To find the problem we will need to be able to replicate this somehow.

Note to other developers and myself: There seems no reason to put Dune::Communication into a shared_ptr. It is a light-weight object that can easily by copied. Here it even seems to be a serial run because of Dune::Communication<int>. Places where this might happen:

$ grep -r -n "shared_ptr" opm | grep -i Comm
opm/simulators/linalg/ExtractParallelGridInformationToISTL.cpp:24:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/ExtractParallelGridInformationToISTL.cpp:40:        anyComm=std::any(Opm::ParallelISTLInformation(Dune::stackobject_to_shared_ptr(idx),
opm/simulators/linalg/ISTLSolver.hpp:627:        std::shared_ptr< CommunicationType > comm_;
opm/simulators/linalg/OwningBlockPreconditioner.hpp:84:std::shared_ptr<OwningBlockPreconditioner<OriginalPreconditioner, Comm>>
opm/simulators/linalg/PressureBhpTransferPolicy.hpp:38:                                     std::shared_ptr<Communication>& commRW,
opm/simulators/linalg/PressureBhpTransferPolicy.hpp:270:    std::shared_ptr<Communication> coarseLevelCommunication_;
opm/simulators/linalg/PressureTransferPolicy.hpp:183:    std::shared_ptr<Communication> coarseLevelCommunication_;
opm/simulators/linalg/WellOperators.hpp:32:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/WellOperators.hpp:165:                            const std::shared_ptr<communication_type>& comm = {})
opm/simulators/linalg/WellOperators.hpp:224:    std::shared_ptr<communication_type> comm_;
opm/simulators/linalg/WellOperators.hpp:358:        : A_( Dune::stackobject_to_shared_ptr(A) ), comm_(comm)
opm/simulators/linalg/bda/opencl/openclBISAI.cpp:55:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/opencl/openclCPR.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/bda/opencl/openclCPR.cpp:53:setOpencl(std::shared_ptr<cl::Context>& context_, std::shared_ptr<cl::CommandQueue>& queue_) {
opm/simulators/linalg/bda/opencl/openclPreconditioner.cpp:58:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/opencl/openclPreconditioner.hpp:36:    std::shared_ptr<cl::CommandQueue> queue;
opm/simulators/linalg/bda/opencl/openclPreconditioner.hpp:50:    virtual void setOpencl(std::shared_ptr<cl::Context>& context, std::shared_ptr<cl::CommandQueue>& queue);
opm/simulators/linalg/bda/opencl/openclSolverBackend.hpp:111:    std::shared_ptr<cl::CommandQueue> queue{};
opm/simulators/linalg/bda/opencl/openclSolverBackend.hpp:156:                   std::shared_ptr<cl::CommandQueue>& queue);
opm/simulators/linalg/bda/opencl/openclBISAI.hpp:114:                   std::shared_ptr<cl::CommandQueue>& queue) override;
opm/simulators/linalg/bda/opencl/openclCPR.hpp:93:                   std::shared_ptr<cl::CommandQueue>& queue) override;
opm/simulators/linalg/bda/opencl/openclSolverBackend.cpp:242:          std::shared_ptr<cl::CommandQueue>& queue_)
opm/simulators/linalg/bda/CprCreation.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/bda/rocm/rocsparseCPR.cpp:26:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:22:#include <dune/common/shared_ptr.hh>
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:50:    CuBlockPreconditioner(const std::shared_ptr<P>& p, const std::shared_ptr<const communication_type>& c)
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:56:    CuBlockPreconditioner(const std::shared_ptr<P>& p, const communication_type& c)
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:58:        , m_communication(Dune::stackobject_to_shared_ptr(c))
opm/simulators/linalg/cuistl/CuBlockPreconditioner.hpp:122:    std::shared_ptr<const communication_type> m_communication;
opm/simulators/linalg/cuistl/CuOwnerOverlapCopy.hpp:388:    CuOwnerOverlapCopy(std::shared_ptr<GPUSender<field_type, OwnerOverlapCopyCommunicationType>> sender) : m_sender(sender){}
opm/simulators/linalg/cuistl/CuOwnerOverlapCopy.hpp:410:    std::shared_ptr<GPUSender<field_type, OwnerOverlapCopyCommunicationType>> m_sender;
opm/simulators/linalg/cuistl/SolverAdapter.hpp:186:            std::shared_ptr<Opm::cuistl::GPUSender<real_type, typename Operator::communication_type>> gpuComm;
bludvigsen commented 2 months ago

Hi, below some more info.

(base) [bjolud@hpcopm01 ~]$ flow --version flow 2024.04

No mpi, no GPU.

These are the files generated (note the EGRID file was generated by OPM, I had to make GRDECL files as input as GDFILE did not work. I tried to attache the PRT and DBG files to this message but it did not allow me to.

-rw-r--r-- 1 bjolud ecl 45208808 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.INIT -rw-r--r-- 1 bjolud ecl 50836016 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.EGRID -rw-r--r-- 1 bjolud ecl 9808 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.RFT -rw-r--r-- 1 bjolud ecl 906883 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.PRT -rw-r--r-- 1 bjolud ecl 882759 Aug 9 15:02 IVAR_AASEN_2021_REF_3DEC2021.DBG (base) [bjolud@hpcopm01 IVAR_AASEN]$

What would be the best course of action for me to proceed with the model? At the moment I do not need all the options in the model; could there be a workaround to avoid the error; remove wells, build coarser grid, less faults, etc.?

It is a quite complicated model using a lot of options in Eclipse. The grid has more than 100 faults and the number of NNCs is very large. There are many long horizontal wells using COMPLUMP and other well options. I did get around all the error messages by editing the input file, but there are still warnings and I will remove all those and see if it helps.

Regards, Bjørn Egil

bludvigsen commented 2 months ago

Just some additional info is that I have looked at the grid and properties generated by OPM with ResInsight and it looks fine.

bska commented 2 months ago

Just some additional info is that I have looked at the grid and properties generated by OPM with ResInsight and it looks fine.

Thanks a lot for the additional information, this is good to know. I do believe you've come across a programming error within the simulator and I would really like to understand the underlying problem. That said, we may have to take the discussion off-line, especially if the model is not fully public. Please feel free to reach out to me by e-mail (Bard.Skaflestad@sintef.no) if you would like to discuss further.

bludvigsen commented 2 months ago

Ok I have sent an email to your SINTEF address.