Closed raphaelhoffmann closed 8 years ago
@netj: we are still living with this heisenbug... Does it look like it could be mkmimo
?
@alldefector Nope. Back then @zifeishan and I confirmed this was happening on a data path that involves just gpdb's psql
.
This no longer seems to be the issue
We are running DD8 with Greenplum on Ubuntu. Over the last few weeks, @zifeishan and I observed that in a small number of cases (maybe 1 out of 50) Greenplum gets stuck when running a DD8 application.
When that happens,
pg_stat_activity
shows a running query which is not in a waiting state. However, there is no I/O and no CPU activity. It is not possible to kill the query usingpg_cancel_backend
orpg_terminate_backend
. One needs tokill -9
the query process; sometimes, another exclusive lock is held by another process as shown byWe need to run
kill -9
for that second query as well. Usually that turns the database back into a working condition; in a few rare cases, the database was corrupt.We believe that this situation usually happens with
COPY FROM STDIN
queries; it looks like Greenplum continues to wait for the sender of the data, but the sender is idle.A recent commit on Greenplum master branch appears to make it easier to kill queries in such state https://github.com/greenplum-db/gpdb/commit/63dd5a6c7202d3458773d200074d1edeaf1b15b7. However, the actual bug must be on the sender's side. One hypothesis (by @netj) is that mkmimo is not handling some error conditions correctly.