Scope of storage capabilities is ambiguous for SEs with tape storage

slithy commented 1 year ago

Scope of `storage.read` and `storage.stage`

As discussed at WLCG DOMA BDT Meeting (April 2023), CHEP (May 2023) and ATLAS S&C week (June 2023):

In the common-jwt-profile document, the scope of the storage.read and storage.stage capabilities should be amended:

The document says that storage.read only applies to data on disk, implying that storage.stage provides both stage and read capabilities for data on tape. However, discussion at the DOMA meeting above concluded that stage and read are separate capabilities which can be authorised independently.
I also understand that dCache has implemented stage and read as separate capabilities (correct me if I am wrong).
At CHEP, I spoke to the StoRM team (see their poster about tape REST API + token authorisation). They also told me that they had implemented stage and read as separate capabilities.
Usually stage and read requests are separated in time. Staging a file can take hours or days, reading it comes afterwards.

Related questions

Does storage.stage grant permission to abort a stage request?
Does storage.stage grant permission to evict a staged file from the buffer?
Similarly, what about capabilities for pinning/unpinning files in the buffer? (EOSCTA does not support this, but dCache does).
At least for EOS, the scope of permissions that can be set in the namespace is more fine-grained than the proposed capabilities. As well as rwx and p ("prepare" or stage permission), there is also the "forbid change", "forbid update" and "forbid deletion" ACL permissions. The current set of token capabilities does not allow the same fine-grained control.
At the DOMA meeting there was some discussion about whether claims in a token can override ACLs in the namespace. In particular, if a directory is set to prohibit deletion, should a token be able to override this? SE managers seem uneasy with this idea.
What happens if there is a bulk request where some files are authorised and others not? Does the entire request fail?

slithy commented 1 year ago

Pull request to address the first point above: https://github.com/WLCG-AuthZ-WG/common-jwt-profile/pull/27

abh3 commented 1 year ago

Some additional ambiguity. Say you have a system that may trigger a stage when a client attempts a read. However, the client does not have "stage" as a claim. What happens next is ambiguous. If you want a transparent system then read implies stage. However, if you want to prevent clients from staging files simply because they want to read them then you really want them to have a stage claim.

for stage followed by abort and evict. It would seem reasonable that if the client staged the file the client should also have the ability to abort the stage as well as evict the file. However, that is not clear when you consider the transparency point raised above.

Pin and unpin certainly should be separate from "stage" as it represents additional resource usage. However, as above if a client has pin privileges is unpin only w.r.t. to files the client pinned or all files?

I am in favor that a site can implement restrictions that are more severe than ones in a token and the site's policy should override the token's claims. Not doing so essentially says a site has surrendered complete control to the token issuer. I doubt many sites would accept that.

As for bulk requests I've seen it implemented in three ways -- two that you mention, the third is the request fails on the first failure encountered even when subsequent requests would succeed (i.e. partial failure). The reasoning is that recovery is much easier using the third scenario.

paulmillar commented 1 year ago

I also understand that dCache has implemented stage and read as separate capabilities (correct me if I am wrong).

dCache currently has only partial support for storage.stage. It treats storage.stage as a synonym for storage.read (as per the spec) but storage.stage does not authorise staging of that file. Instead, the existing stage authorisation processes are enforced.

paulmillar commented 1 year ago

At the DOMA meeting there was some discussion about whether claims in a token can override ACLs in the namespace. In particular, if a directory is set to prohibit deletion, should a token be able to override this? SE managers seem uneasy with this idea.

I find this comment rather strange.

As I understand it, the point of explicit AuthZ is to delegate AuthZ decisions (for some subtree within the namespace) to the VO. If the token says the bearer is authorised to delete a particular file then the storage system should honour that statement and delete the file when so requested.

Having the storage system overriding the VO's AuthZ decision dilutes the benefits from adopting explicit AuthZ.

beer4duke commented 9 months ago

Having the storage system overriding the VO's AuthZ decision dilutes the benefits from adopting explicit AuthZ.

Experiments define some SLAs directly with every storage endpoint like: for example never allow RAW data deletion at T0.

As the Storage endpoints are ultimately responsible for hosted data integrity: having VO's AuthZ decision overriding Storage endpoint SLAs revokes Storage endpoint responsibility for all its data.

But in case of deletion incident we all know that the storage endpoint will be blamed as usual and will have to spend expensive operations time to restore as much as it can.

I would not call this decision dilution but a mutually beneficial safety net.

paulmillar commented 9 months ago

I think this is an important point, and something that (I think) should be clarified and stated very clearly and explicitly.

A specific example scenario would be:

If a site is under some kind of commitment (SLA/MoU) to never delete certain data and a request comes in to delete said data, with a token (from the VO) that authorises that operation, what is the correct behaviour of the storage?

I think this generalises naturally to a broader question: are sites under any kind of MoU or SLA that could be in conflict with that site supporting tokens with explicit AuthZ statements?

WLCG-AuthZ-WG / common-jwt-profile