epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
290 stars 99 forks source link

Segfault when calling grossFormula() function #1964

Open ihuang8 opened 1 month ago

ihuang8 commented 1 month ago

Summary Short summary of problem here

For some molecules, I get a segfault around 30-50% of the time when calling grossFormula() on the molecule. This seems to occur for structures with multiple repeating units, but it does not happen consistently for a given structure.

Steps to Reproduce

The following is some code I used to hit this error:

from indigo import Indigo

structure = "SCCCOP(OCCOP(OCOCO)(=O)OCC)(=O) |Sg:n:3,4:k:ht,Sg:n:7,8,9:m:ht,Sg:n:8,13,14:n:ht,Sg:n:17,18:k:ht|"
indigo = Indigo()
molecule = indigo.loadStructure(structure)
grossFormula = molecule.grossFormula()

Expected behavior A clear and concise description of what you expected to happen.

Retrieve gross formula for molecule without a segfault.

Actual behavior A clear and concise description of what the bug is.

Segmentation fault, looks like it may come from line 211 of core/indigo-core/molecule/src/molecule_gross_formula.cpp

unit.isotopes.emplace(key, 1);

Valgrind output:

==6173== Invalid read of size 8
==6173==    at 0x5B519A3E: std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (in /home/proj/main/env/lib/python3.11/site-packages/indigo/lib/linux-x86_64/libindigo.so)
==6173==    by 0x5AE8167A: std::_Rb_tree_iterator<std::pair<int const, int> >::operator--() (stl_tree.h:302)
==6173==    by 0x5AE80389: std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_get_insert_unique_pos(int const&) (stl_tree.h:2126)
==6173==    by 0x5B130EAF: std::pair<std::_Rb_tree_iterator<std::pair<int const, int> >, bool> std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_emplace_unique<int&, int>(int&, int&&) (stl_tree.h:2434)
==6173==    by 0x5B1309CB: std::pair<std::_Rb_tree_iterator<std::pair<int const, int> >, bool> std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::emplace<int&, int>(int&, int&&) (stl_map.h:606)
==6173==    by 0x5B12F54E: indigo::MoleculeGrossFormula::collect(indigo::BaseMolecule&, bool) (molecule_gross_formula.cpp:212)
==6173==    by 0x5AE6E80A: indigoGrossFormula (indigo_calc.cpp:49)
==6173==    by 0xBFC8F79: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBFC840D: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBFC8B0C: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBF9042A: ??? (in /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
==6173==    by 0xBF8F612: ??? (in /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
==6173==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
==6173==
==6173==
==6173== Process terminating with default action of signal 11 (SIGSEGV)
==6173==  Access not within mapped region at address 0x8
==6173==    at 0x5B519A3E: std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (in /home/proj/main/env/lib/python3.11/site-packages/indigo/lib/linux-x86_64/libindigo.so)
==6173==    by 0x5AE8167A: std::_Rb_tree_iterator<std::pair<int const, int> >::operator--() (stl_tree.h:302)
==6173==    by 0x5AE80389: std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_get_insert_unique_pos(int const&) (stl_tree.h:2126)
==6173==    by 0x5B130EAF: std::pair<std::_Rb_tree_iterator<std::pair<int const, int> >, bool> std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_emplace_unique<int&, int>(int&, int&&) (stl_tree.h:2434)
==6173==    by 0x5B1309CB: std::pair<std::_Rb_tree_iterator<std::pair<int const, int> >, bool> std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::emplace<int&, int>(int&, int&&) (stl_map.h:606)
==6173==    by 0x5B12F54E: indigo::MoleculeGrossFormula::collect(indigo::BaseMolecule&, bool) (molecule_gross_formula.cpp:212)
==6173==    by 0x5AE6E80A: indigoGrossFormula (indigo_calc.cpp:49)
==6173==    by 0xBFC8F79: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBFC840D: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBFC8B0C: ffi_call (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.2)
==6173==    by 0xBF9042A: ??? (in /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
==6173==    by 0xBF8F612: ??? (in /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
==6173==  If you believe this happened as a result of a stack
==6173==  overflow in your program's main thread (unlikely but
==6173==  possible), you can try to increase the size of the
==6173==  main thread stack using the --main-stacksize= flag.
==6173==  The main thread stack size used in this run was 83886080.
==6173==
==6173== HEAP SUMMARY:
==6173==     in use at exit: 95,564,910 bytes in 521,341 blocks
==6173==   total heap usage: 2,086,024 allocs, 1,564,683 frees, 628,662,008 bytes allocated
==6173==
==6173== LEAK SUMMARY:
==6173==    definitely lost: 192 bytes in 2 blocks
==6173==    indirectly lost: 0 bytes in 0 blocks
==6173==      possibly lost: 572,047 bytes in 258 blocks
==6173==    still reachable: 94,992,671 bytes in 521,081 blocks
==6173==                       of which reachable via heuristic:
==6173==                         stdstring          : 8,507,488 bytes in 163,222 blocks
==6173==                         newarray           : 226 bytes in 2 blocks
==6173==         suppressed: 0 bytes in 0 blocks
==6173== Rerun with --leak-check=full to see details of leaked memory
==6173==
==6173== For lists of detected and suppressed errors, rerun with: -s
==6173== ERROR SUMMARY: 7 errors from 5 contexts (suppressed: 0 from 0)

Environment details: