dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

FilePump: alert from replica deletion #1067

Open nikmagini opened 7 years ago

nikmagini commented 7 years ago

On Nov 14th the central FilePump agent started alerting this:

2016-11-14 16:37:55: FilePump[14613]: alert: database error: DBD::Oracle::st execute failed: ORA-02292: integrity constraint (CMS_TRANSFERMGMT.FK_XFER_TASK_REPLICA) violated - child record found (DBD ERROR: OCIStmtExecute) [for Statement "delete from t_xfer_replica where (node, fileid) in (select xr.node, xr.fileid from t_xfer_delete xd join t_xfer_replica xr on xr.node = xd.node and xr.fileid = xd.fileid where xd.time_complete is not null and xd.time_complete >= xr.time_create)"] at /data/ProdNodes/PHEDEX/perl_lib/PHEDEX/Core/DB.pm line 322

It won't stop transfers from progressing, but it stops the registration of completed deletions and the statistics updates.

Looks like some deletion was executed while there was some other task (transfer? verify?) of the same replica in progress.

Needs further investigation

N.

nikmagini commented 7 years ago

This alert resolved itself automatically after 16 hours; probably the transfer task that was blocking the deletion expired.

Already seen in testbed in 2011 BTW:

https://github.com/dmwm/PHEDEX/issues/651#issuecomment-26934517