DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
113 stars 176 forks source link

checksum files from WNs to SEs in LHCB #760

Closed hamar closed 12 years ago

hamar commented 12 years ago

Hi,

At CC in Lyon, we have some tickets about corrupted files copied from WNs to SEs, like:

/lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016764/0001/00016764_00012227_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016764/0001/00016764_00010674_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016764/0001/00016764_00010644_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016764/0001/00016764_00013474_1.Dimuon.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016764/0001/00016764_00016701_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016764/0001/00016764_00018376_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016764/0001/00016764_00011409_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016764/0001/00016764_00014025_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016764/0002/00016764_00025515_1.Radiative.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00010704_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00012248_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00010622_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00010602_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00010653_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00010593_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0001/00016764_00014037_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016764/0002/00016764_00029737_1.Semileptonic.dst /lhcb/LHCb/Collision11/BHADRON.DST/00016773/0001/00016773_00016759_1.Bhadron.dst /lhcb/LHCb/Collision11/BHADRON.DST/00016773/0002/00016773_00020476_1.Bhadron.dst /lhcb/LHCb/Collision11/CHARM.MDST/00016773/0004/00016773_00040644_1.Charm.mdst /lhcb/LHCb/Collision11/CHARM.MDST/00016773/0003/00016773_00035207_1.Charm.mdst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016773/0001/00016773_00012899_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016773/0001/00016773_00018334_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016773/0001/00016773_00016786_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016773/0004/00016773_00041015_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/CHARMCOMPLETEEVENT.DST/00016773/0003/00016773_00033543_1.CharmCompleteEvent.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0001/00016773_00016844_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0001/00016773_00016823_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0001/00016773_00016718_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0003/00016773_00035584_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0003/00016773_00033633_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0003/00016773_00035094_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0004/00016773_00041993_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0003/00016773_00039610_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016773/0003/00016773_00032521_1.Dimuon.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0001/00016773_00016864_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0001/00016773_00016836_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0003/00016773_00035891_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0003/00016773_00035158_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0003/00016773_00035101_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0003/00016773_00035749_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0004/00016773_00042171_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016773/0003/00016773_00036531_1.Radiative.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016773/0001/00016773_00011716_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016773/0001/00016773_00016777_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016773/0002/00016773_00020826_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016773/0003/00016773_00036594_1.Semileptonic.dst /lhcb/LHCb/Collision11/BHADRON.DST/00016992/0000/00016992_00002944_1.Bhadron.dst /lhcb/LHCb/Collision11/BHADRON.DST/00016992/0000/00016992_00008252_1.Bhadron.dst /lhcb/LHCb/Collision11/BHADRON.DST/00016992/0000/00016992_00006653_1.Bhadron.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016992/0000/00016992_00002953_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016992/0000/00016992_00003455_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016992/0000/00016992_00004044_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016992/0000/00016992_00005205_1.Dimuon.dst /lhcb/LHCb/Collision11/DIMUON.DST/00016992/0000/00016992_00001059_1.Dimuon.dst /lhcb/LHCb/Collision11/PID.MDST/00016992/0000/00016992_00008153_1.PID.mdst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016992/0000/00016992_00001222_1.Radiative.dst /lhcb/LHCb/Collision11/RADIATIVE.DST/00016992/0000/00016992_00001838_1.Radiative.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016992/0000/00016992_00000817_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016992/0000/00016992_00000992_1.Semileptonic.dst /lhcb/LHCb/Collision11/SEMILEPTONIC.DST/00016992/0000/00016992_00001214_1.Semileptonic.dst

I was looking into SRM2Storage file and I found:

if localSize == remoteSize: gLogger.debug( "SRM2Storage.getFile: Post transfer check successful." ) errorMessage = "SRM2Storage.getFile: Source and destination file sizes do not match."

no checksum :(

The portal server than they are using is:

https://lhcb-web-dirac.cern.ch/DIRAC/LHCb-Production/lhcb_prod/jobs/PilotMonitor/display

Thanks in advance,

Vanessa

errorMessage = "SRM2Storage.__getFile: Source and destination file sizes do not match."

graciani commented 12 years ago

what do you want? we are not asking SEs to the checksum checking on every transfer, could you evaluate the overhead that this means and guarantee that this will not overload any of our SEs?

Did you check what all this files have in common. Since they are all concentrated at a single site this likely points to a problem related to that site.

KrzysztofCiba commented 12 years ago

Please assign me to this. All development for this is in my branch: https://github.com/KrzysztofCiba/DIRAC/tree/DEV-DMS-checksums-check, will be ready for testing soon.

KrzysztofCiba commented 12 years ago

In prod since a while, this can be closed.

atsareg commented 12 years ago

In production already