epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
292 stars 100 forks source link

Segfault while rebuilding index on molfile using bingo.molecule type #1884

Open amhuhn2 opened 3 months ago

amhuhn2 commented 3 months ago

Steps to Reproduce

  1. Use Indigo library (Bingo cartridge). Describe environment *Note: this issue is not 100% reproducible in our environment with a given set of data. It only seems to happen maybe 1/5 or 1/8 of the time.

OS: (output from uname -a): Linux bpeqabirdmlapvm02 4.18.0-240.15.1.el8_3.x86_64 #1 SMP Mon Mar 1 17:16:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

CentOS Linux release 8.3.2011

32 Gb RAM 8 CPU's

Output from "select version();":

PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit

276 Gb available on the filesystem on which PostgreSQL keeps its data and logs.

  1. Add script or SQL to reproduce the issue See attached file.

script.txt

Actual behavior During or immediately after the rebuild of the bingo index, the OS records a segfault. The PostgreSQL log records a corrupted double-linked list, then PostgreSQL terminates and restarts.

Attached is an excerpt from /var/log/messages. messages.txt

Attached is an excerpt of the PostgreSQL log: (note that the stored procedure source.uspupsertmolecule executes the CREATE INDEX statement that seems to be failing. The next stored proc in the pipeline is source.uspupsertlot).

Note the error "corrupted double-linked list".

postgresql-log.txt

Expected behavior The bingo index should be rebuilt with no segfaults thrown, and the pipeline should continue on afterwards as normal. No "corrupted double-linked list" should be mentioned in the PostgreSQL log.

Environment details:

Attachments Three attachments included.

Additional context Add any other context about the problem here.

We are in the process of upgrading to the latest version of bingo, but reading through the release notes, I did not see any fixed bugs that looked like the issue we're experiencing. The closest ones had to do with CDX files, and we are parsing molfiles instead.

Fixed in 1.10:

#1068 CDX-loader crash

Fixed in 1.12:

#1126 Segfault when iterating CDX file from USPTO downloads

Fixed in 1.13:

#1139 core dumped when reading CDX file downloaded from USPTO