DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
113 stars 176 forks source link

RMS flagging file as 'Done' when they are not #3211

Closed vbpnl closed 7 years ago

vbpnl commented 7 years ago

Hello,

We in Belle II operations have been noticing that many times RMS flags the file in its ReqDB.File table as 'Done' when in fact it is either not replicated ot not registered in FC/LFC.

Here is a concrete example.

A file was supposed to be transferred to SIGNET-TMP-SE. ReqDB.File says Done

LFN='/belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root' query='SELECT * FROM File WHERE LFN="%s"'%LFN runQuery(query)

Attempt Checksum    Error   GUID    FileID  Status  ChecksumType    LFN PFN OperationID Size
2   433716b1        9CEF8371-03B0-9655-66CF-F6D7D1884D32    118692162   Done    ADLER32 /belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root   None    4041877 23089979

REA log says error in this file

$ grep /belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root /opt/dirac/runit/RequestManagement/RequestExecutingAgent/log/current

2016-12-10 17:31:20 UTC RequestManagement/RequestExecutingAgent/pid_32596/DDM_fromNONE_toSIGNET-TMP-SE_20161210_173005/0/ReplicateAndRegister WARN: unable to schedule /belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root for FTS: _getSurlForLFN: Failed to create SRM2 storage for SIGNET-TMP-SE: StorageFactory._getStorageOptions: Failed to get storage status 2016-12-10 17:31:23 UTC RequestManagement/RequestExecutingAgent/SRM2Storage INFO: __putFile: Executing transfer of srm://se.hep.pnnl.gov:8443/srm/v2/server?SFN=/se/belle/TMP/belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root to srm://dcache.ijs.si:8443/srm/managerv2?SFN=/pnfs/ijs.si/belle/TMP/belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root using 4 streams

Also file was never transferred to the SE

$ srmls srm://dcache.ijs.si:8443/srm/managerv2?SFN=/pnfs/ijs.si/belle/TMP/belle/MC/fab/release-00-07-02/DBxxxxxxxx/MC7/prod00000823/s00/e0000/4S/r00000/ddbar/sub15/mdst_016883_prod00000823_task00016883.root Sat Dec 10 09:45:07 PST 2016: Return status:

--

There has been a hint that to keep retry mechanism, file status is not changed just yet in the ReplicateAndRegiter plugin code. A related post on the forum is

https://groups.google.com/forum/#!topic/diracgrid-forum/JclTwweKp2s

This issue waits your attention.

Thanks, Vikas

vbpnl commented 7 years ago

This issue seems to have been addressed. I was using v6r14p28. And this has been resolved in v6r14p39

See my post here https://groups.google.com/forum/#!topic/diracgrid-forum/JclTwweKp2s