Closed julian1 closed 4 years ago
Might be an issue on the db size. tx is in rollback and stalled. Although killing it doesn't fix.
jfca@2-aws-syd:~$ sudo -u postgres psql -d harvest -c "select now()-query_start as time,* from pg_stat_activity where usename = 'anmn_nrs_dar_yon'" | cat
time | datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | backend_xid | backend_xmin | query
-----------------+-------+---------+------+----------+------------------+------------------+---------------+-----------------+-------------+-------------------------------+------------+------------------------------+-------------------------------+---------+-------+-------------+--------------+----------
00:00:04.942303 | 16415 | harvest | 1395 | 16472 | anmn_nrs_dar_yon | | 52.64.146.138 | | 57181 | 2016-05-06 01:59:12.236771+00 | | 2016-05-06 02:03:53.04598+00 | 2016-05-06 02:03:53.046038+00 | f | idle | | | ROLLBACK
(1 row)
jfca@2-aws-syd:~$
Appears that talend liquibase must not have completed, and there was still a lock on the table,
harvest=# select * from anmn_nrs_dar_yon.databasechangeloglock ;
id | locked | lockgranted | lockedby
----+--------+----------------------------+-------------------------------------------------------------------
1 | t | 2016-05-05 05:32:38.322+00 | fe80:0:0:0:4b5:54ff:fe7c:a779%2 (fe80:0:0:0:4b5:54ff:fe7c:a779%2)
(1 row)
Fixed with
harvest=# update anmn_nrs_dar_yon.databasechangeloglock set locked = false where id = 1 ;
UPDATE 1
It's a worry that pg_locks didn't show the lock on rollback.
@julian1 The same thing happened again today. And it's my fault. I did a po_s3_del to delete a file, and typed ctr-C by mistake. This kills something the process, but the database hangs with the database changeloglock.
Do you thing you could run the same query again please ?
@julian1 the same thing happened by itself with the FAIMMS harvester (prob due to the db update ?). none of the files get processed. Could you run the same query on the faimms schema ? cheers
@smancini FYI, this is why there are a few errors on nagios
@lwgordonimos FYI so you can be aware of issues related to talend and liquibase locks on db
@lbesnard done.
harvest=# update faimms.databasechangeloglock set locked = false where locked = true;
UPDATE 1
harvest=#
@ggalibert @bhasin85 FYI
This happened again with ACORN
What a bunch of crap. I wonder what happens to files when a failed liquibase update kills the pipeline?
They go in the error directory.
@julian1 We're having a few pipeline issues at the moment. It looks like my FAIMMS harvester is blocked again
Caused by: liquibase.exception.LockException: Could not acquire change log lock. Currently locked by fe80:0:0:0:4b5:54ff:fe7c:a779%2 (fe80:0:0:0:4b5:54ff:fe7c:a779%2) since 11/22/16 2:15 AM
Do you think you could run this again ?
harvest=# update faimms.databasechangeloglock set locked = false where locked = true;
UPDATE 1
harvest=#
@bpasquer are you also affected by this ?
@aodn/emii-ops anyone could do this please ?
I'll have a look at this
Locked since 21/11/16.
harvest=> select * from databasechangeloglock;
id | locked | lockgranted | lockedby
----+--------+----------------------------+-------------------------------------------------------------------
1 | t | 2016-11-21 15:15:53.576+00 | fe80:0:0:0:4b5:54ff:fe7c:a779%2 (fe80:0:0:0:4b5:54ff:fe7c:a779%2)
harvest=> update faimms.databasechangeloglock set locked = false where locked = true; UPDATE 1
10-aws-syd
Attempted to kill the connection client on db-prod without any result.
Had to kill manually on 10-aws-syd