dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

Integrate federated lookup into xrootd door #5829

Open alrossi opened 3 years ago

alrossi commented 3 years ago

CMS contacts have asked whether there are more performant solution for allowing CMS dCache sites to join the AAA xrootd federation.

As of now the following is done:

As time has moved on and the AAA xrootd federation is utilized more extensively, CMS has noticed a significant performance cost, arguably stemming from the NFS mount and look ups.

Thus they ask us to consider enabling the dCache doors to act as an xrootd manager, too.

paulmillar commented 3 years ago

Do you know if the cmsd software does a directory listing, or does it just stat files?

It might be nice to know why NFS is being slow.

XMol commented 3 years ago

Hi Paul,

today we had a formal discussion about these issues among us dCache admins and the CMS contacts. Now I've learned that CMS actually is doing some kind of inventory comparison via xrootd. That is, they've got some script that does directory listing via xrootd and compares it to their expectations.

The consensus for us at GridKa is that this is quite a bad idea. xrootd explicitly is not meant to be used like this and NFS is also known to be challenged when it is supposed to deliver large quantities of metadata at high rates. So if the dCache door was able to act like an xrootd redirector, this doesn't even address the foolishness of the original use-case.

During this discussion, it was also mentioned that the door can already do the same lookups without going through the redirector and is much faster, too. With that piece of information, one of our xrootd experts suggested to simply make the xrootd redirector delegate the namespace lookups to the door, instead of stating/listing NFS-paths. The theory now is to have a minimalistic xrootd redirector next to the door, which will connect dCache doors with higher level xrootd redirectors - no data clustering under xrootd is necessary anymore (which was the part that does the NFS inspection). We will test this soon and if successful this offers a much simplified setup for any CMS site that wants to join AAA. Again, it does not address the original use-case for CMS, it only optimizes it.

After all that, enabling the door to act as a redirector itself surely would still be very appreciated, but would become much less urgent at the same time. Instead, the plugin for the doors is given much more importance.

paulmillar commented 3 years ago

Thanks for the feedback @XMol.

I believe this kind of checking was originally part of the contribution from MIT, as something they provided on top of PhedEx's file/data-set movement service. It was something highlighted as useful (and ideally not lost) when CMS decide to adopt Rucio.

Perhaps @alrossi can identify what pieces are missing, so that sites may deploy only dCache (i.e., without running a SLAC xrootd redirectory + NFS-mounted dCache). I believe that AAA currently doesn't work with dCache "as is": hence there is this open issue.

alrossi commented 3 years ago

I'm afraid that at the current level of knowledge I possess I would not be able to answer that question.

It would be very helpful to me to be able to see exactly what we are talking about here: configuration, code that is being used, explicit details of what is going on. This all is still very vague to me.

I can say, however, the following. I am somewhat averse to accumulating stray functionality in the dCache door, especially if it means having to make it more and more stateful. On the other hand, creating a dCache service equivalent to an xrootd redirector would be a lot of work and I'm not sure what the payoff would be.

But until I get a better understanding of what is really going on here (i.e., the pieces I'm supposed to tell you are missing [smiley emoji]) ... I can't actually say.

What do I begin to look at here to get a better understanding?

Thanks, Al

paulmillar commented 3 years ago

Al, I believe the concept is relatively simple.

From the client's perspective ...

The client wishes to read a file, but doesn't know whether the file is located at a particular endpoint. It makes a request to read the file (kXR_open). The endpoint either "has" the file or redirects the client to where it believe the file is located (typically) on another site.

Using these semantics, it is possible to build a federation by having redirection endpoints that somehow "know" where files are located within their sphere-of-influence and redirect clients to the correct endpoint. This way, the client opens the file at the central redirector and is redirected to the corresponding endpoint that has the file. It issues another open request, and receives the file's contents.

(In real life, I believe they have multiple federation redirectors: a global one that redirects to per-continent redirectors, for prosaic reasons. This doesn't change the basic idea.)

The central federation redirector knows the contents by stat-ing the file. The precise semantics of this stat might be slightly different from the regular stat (you never know), so supporting this might require some changes.

This federation redirector is not omnipotent and could redirect the client to the "wrong" endpoint, so the endpoint needs to be able to redirect the client back to the federation endpoint if something went wrong. There must also be some state stored, so the federation redirector know not to redirect the client back to the same endpoint. I don't know the details, but this might involve a token that gets passed around.

I believe this is the part we are missing in dCache: being about to redirect the client back to the federation redirector if there's an error.

This is likely a typical xrootd setup: the implementation is the specification, and nothing is documented (I would be pleasantly surprised if I'm wrong!). So I imagine this work would likely involve the usual reverse-engineering and/or hounding people to describe their protocol. This (along with testing) would be the hard part. I suspect that actual implementation is relatively trivial: issue a redirect instead of an error.

alrossi commented 3 years ago

While I think now the general consensus of the team is that we do not currently wish to pursue this integration into the dCache door, we will leave this issue open because it is desirable to try to get to the bottom of the performance issues that have been reported.

alrossi commented 3 years ago

As per Tigran, we can continue to use https://rt.dcache.org/Ticket/Display.html?id=10113 to discuss the site/installation details.

alrossi commented 9 months ago

Shall we drop this issue?

XMol commented 9 months ago

I personally certainly don't insist on it. 🙂