dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

t_agent_version unique constraint alert on agent startup #917

Open ericvaandering opened 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 98928 reported by None on Mon Nov 19 05:51:11 2012.

Hello,

a site admin reported that one of their agents produced this alert on startup. The agent later recovered automatically:

+verbatim+ 2012-11-16 11:34:05: FileDownload[1539]: alert: DBD::Oracle::st execute failed: ORA-00001: unique constraint (CMS_TRANSFERMGMT_TEST.PK_AGENT_VERSION) violated (DBD ERROR: OCIStmtExecute) [for Statement "insert into t_agent_version (node, agent, time_update, filename, filesize, checksum, release, revision, tag) values (:node, :agent, :now, :filename, :filesize, :checksum, :release, :revision, :tag)" with ParamValues: :agent='276', :checksum='MD5:fab400b96ee82475add46f2fa1be9e26', :filename='FTS.pm', :filesize=19907, :node='69', :now=1353065644.71028, :release='PHEDEX_4_1_1_pre3', :revision=undef, :tag=undef] at /home/cmsprd/phedex/swtest/slc5_amd64_gcc461/cms/PHEDEX/PHEDEX_4_1_1_pre3/perl_lib/PHEDEX/Core/DB.pm line 322. -verbatim-

Looking at the code of PHEDEX/Core/Agent/DB.pm, this alert could happen if two different instances of the same agent (e.g. download-t1 and download-t2) are started at the same time and try to connect and identify themselves simultaneously to the DB.

Possible solution: rewrite the "insert into t_agent_version" statement using "merge"?

Setting at low priority since the agents will recover automatically the next time they try to reconnect to the DB.

Cheers Nicolo'