dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

Active transfers report lacks third-party-copy records #6540

Closed XMol closed 2 years ago

XMol commented 2 years ago

Hello dCache.org,

I had reported this in the RT already (#10303) and Albert informed me, that @paulmillar had already submitted a patch that addresses at least the HTTP TPC transfers. Admittedly, xrootd TPCs are not covered yet.

This issue serves as a feature request to continue working on the issue. Our monitoring for current transfer activity simply lacks those class of transfers and therefor is quite misleading. The situation gets worse as the WLCG experiments dispatch TPCs more liberally now.

Thank you for your attention, Xavier.

/cc @alrossi

alrossi commented 2 years ago

I will be addressing some other xroot issues soon so I will also take a look at this.

In the meantime, we need to get Paul's patch committed and backported.

Thanks.

paulmillar commented 2 years ago

Hi @XMol,

Thanks for the motivating issue!

To be honest, I didn't really like to approach taken in the patch (even though I wrote it!). Although it was technically sound (it would work), the approach was rather inelegant. which is why I was reluctant to commit the patch.

I think I've come up with a better approach, so I'll put that together and hopefully have this issue fixed soon.

Cheers, Paul.

alrossi commented 2 years ago

@paulmillar I assume that the xrootd-specific solution could then be built on your revised patch? Should I wait until that is committed to implement?

alrossi commented 2 years ago

As a sanity check, I just did two third-party copies using xrootd.

INCOMING (from vanilla server to dCache):

xrdcp5x --tpc only xroots://fndcatemp1.fnal.gov:1094//fermilab/users/arossi/largedata?authz=Bearer%20`cat $XDG_RUNTIME_DIR/bt_u8773` xroots://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/fermilab/users/arossi/volatile/largedata-2022032911391648571986030393349?authz=Bearer%20`cat $XDG_RUNTIME_DIR/bt_u8773`

Screen Shot 2022-03-29 at 11 45 43

OUTGOING (from dCache to vanilla server):

xrdcp5x --tpc only xroots://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/fermilab/users/arossi/volatile/largedata-2022032911391648571986030393349 xroots://fndcatemp1.fnal.gov:1094//fermilab/users/arossi/largedata?authz=Bearer%20`cat $XDG_RUNTIME_DIR/bt_u8773`

Screen Shot 2022-03-29 at 11 52 43

As you can see, both of them register in the active transfer lists on the dCacheView page.

This must be a special problem with webdav?

Al

paulmillar commented 2 years ago

Yes, HTTP-TPC is different from xroot-TPC.

xroot-TPC uses the normal copy process, but adds metadata so the pool knows it's actually a third-party transfer.

HTTP-TPC uses the RemoteTransferManager service to do (almost) all the coordination. In this way, RemoteTransferManager is acting like a door, but doesn't advertise itself as such, so the transfers are unknown to httpd and frontend.

XMol commented 2 years ago

In this way, RemoteTransferManager is acting like a door, but doesn't advertise itself as such, so the transfers are unknown to httpd and frontend.

Billing even declares a new protocol for these HTTP-TPCs, too.

03.30 00:03:12 [pool:f01-152-140-e_D_lhcb@f01-152-140-e_D_lhcbDomain:transfer] [0000DAC4044EF7934434961E1B319225702B,125261724] [/pnfs/gridka.de/lhcb/LHCb-Disk/lhcb/MC/2017/DSTARD02KSKK.HLTFILTER.MDST/00158493/0000/00158493_00000297_1.dstard02kskk.hltfilter.mdst] dc_lhcb:LHCB@osm 125261724 1497 true {RemoteHttpsDataTransfer-1.1:https://xfer-lhcb.cr.cnaf.infn.it:8443/disk/lhcb/MC/2017/DSTARD02KSKK.HLTFILTER.MDST/00158493/0000/00158493_00000297_1.dstard02kskk.hltfilter.mdst} [door:RemoteTransferManager@transferManager-f01-080-113Domain:1648591391028-333472] [p2=false] {0:""}

XMol commented 2 years ago

Thanks for the motivating issue!

And here's a motivating reminder. :wink:

We still cannot quantify how much data is send to or read from our dCache SEs via HTTP-TPC, since there are no active transfers for that.

alrossi commented 2 years ago

As far as xroot is concerned, are we good? Do you use native xroot TPC? Are those not showing up? My own test above seems to suggest there is nothing further to do there, that this is specifically an HTTPS problem (which I think Paul has promised to take care of ...)

Al

XMol commented 2 years ago

Yes, chances are, this is exclusively about HTTPS-TPC.

XMol commented 2 years ago

This issue is solved by ed926ca25e388e7d6c07511dcbaf92c2088cdbf2, which went into 7.2.20 (4f0328d84c786288c049d74682b2ec4cbb997ba2), 8.0.10 (1fca124200792cd153521af3b5d02ab56201b252) and 8.1.4 (61e98372efd7352e0630a3dd1b1824d1f6fcf12b)