dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

API: SSE event for file being brought online #5315

Open onnozweers opened 4 years ago

onnozweers commented 4 years ago

Dear dCache devs,

We're looking into server-sent events. When we stage a file (dCache 6.1 snapshot), we can't see an event that indicates that the file has come online. We only see some attrib events:

[jnb-adanezi@wn-ui-01 ~]$ curl -H "Authorization: Bearer $token" -H 'accept: text/event-stream' --fail --silent --show-error -X GET "https://dolphin12.grid.surfsara.nl:20443/api/v1/events/channels/Wt4LepmeRpeSXPgP5dUmIA"
event: inotify
id: 0
data: {"event":{"name":"IMG_2107.jpg","mask":["IN_ATTRIB"]},"subscription":"https://dolphin12.grid.surfsara.nl:20443/api/v1/events/channels/Wt4LepmeRpeSXPgP5dUmIA/subscriptions/inotify/AACkxyqL91RFMZeCV2wp9RW1"}
event: inotify
id: 1
data: {"event":{"name":"IMG_2107.jpg","mask":["IN_ATTRIB"]},"subscription":"https://dolphin12.grid.surfsara.nl:20443/api/v1/events/channels/Wt4LepmeRpeSXPgP5dUmIA/subscriptions/inotify/AACkxyqL91RFMZeCV2wp9RW1"}
event: inotify
id: 2
data: {"event":{"name":"IMG_2107.jpg","mask":["IN_ATTRIB"]},"subscription":"https://dolphin12.grid.surfsara.nl:20443/api/v1/events/channels/Wt4LepmeRpeSXPgP5dUmIA/subscriptions/inotify/AACkxyqL91RFMZeCV2wp9RW1"}
event: inotify
id: 3
data: {"event":{"name":"IMG_2107.jpg","mask":["IN_ATTRIB"]},"subscription":"https://dolphin12.grid.surfsara.nl:20443/api/v1/events/channels/Wt4LepmeRpeSXPgP5dUmIA/subscriptions/inotify/AACkxyqL91RFMZeCV2wp9RW1"}

Have events for staging/QoS operations been implemented? If not, are there any plans to implement them? We think such event types would be very useful to our users.

Cheers, Natalie & Onno

paulmillar commented 4 years ago

Hi Natalie & Onno,

You raise a couple of interesting points.

First, please let me explain a bit about the inotify and how this relates to QoS changes.

The inotify implementation in dCache is as faithful an implementation of the Linux inotify API as I could manage. They are not identically the same, since they have different management and event-delivery mechanisms, but the semantics is as close as I could manage with dCache.

One feature of the Linux inotify interface is the extremely limited information provided in the event: often just the kind of event and the target. Rather than enrich the information, I decided to follow suite, since applications in the Linux world have had to deal with those limitations :-)

On Linux, the IN_ATTRIB event is triggered by the file's metadata changes; for example, by changing its ownership, timestamps, ACLs, ... The IN_ATTRIB event itself (the event metadata) doesn't say what about the file has changed. Therefore, the client that cares about these file metadata changing must somehow maintain the file's state. Then, when it receives the IN_ATTRIB event, the client then queries the file's metadata and compares this against its cached information to discover what has changed. Alternatively, a client that displays information can fetch fresh information and update its display.

In dCache, the IN_ATTRIB event is triggered for all those activities that (in Linux) will trigger an event. In addition, dCache will trigger an IN_ATTRIB event when other things change: a new checksum value is known (with different algorithm) and QoS changes, such as tape operations.

So, these IN_ATTRIB events you described are (quite likely) QoS changes. However, to know that they are really QoS transitions and to learn the new QoS, your client would need to double-check (e.g., querying the file through the REST API). This is similar to how Linux clients stat a file after receiving an IN_ATTRIB event.

Second, you mentioned QoS events.

Although QoS monitoring is possible by monitoring inotify's IN_ATTRIB events, there are a number of limitations. First, an inotify subscription targets only a directory and its children (or a specific file), while users typically want to monitor an entire sub-tree. Second, the IN_ATTRIB event doesn't say what has changed, forcing the client to remember the file's current status and stat the file after receiving the event, which is also undesirable.

Therefore, I imagine adding an additional monitoring support in Frontend/SSE, independent from inotify, that is more targeted at this kind of monitoring: this is called QoS monitoring. Clients should be able to monitor an entire subtree within a single subscription. The events should provide the full path of the file, describe what changed and (possibly) some identifier for the client action the triggered the change. This last point is to allow a client that initiates a QoS transition (with a corresponding identifier) to select only the corresponding QoS events based on that identifier. For example, an srmBringOnline request would trigger QoS events that include the SRM request ID.

For the most part, this is fairly straight-forward to implement. The problem is in the authorisation model: which events are users allowed to see. In some scenarios, it is very important that file names and paths remain secret, something known only to authorised users. Providing support for monitoring an entire sub-tree makes this difficult, as that subtree could contain directories the user is not allowed to access. Doing this check for each event would be expensive.

On possibility is to authorise users to see only QoS events that are a result of their own actions (I would see QoS events for my own activities, but wouldn't see QoS events for files staged by Onno).

Another possibility is to authorise QoS events so users can see event for files they own. Under this model, I would see QoS events for my files, if Onno stages them. This isn't necessarily safe (my file could be moved into a directory that I cannot access) but I think these edge cases are fairly unlikely, making this probably "good enough".

So, my plan is that a user can see QoS events that they triggered or that target files they own.

There are some alternative approaches (which are more tricky, but more flexible), so it would be interesting to learn if this basic authorisation model would be sufficient for your use-cases.

Cheers, Paul.

onnozweers commented 4 years ago

Hi Paul,

So, my plan is that a user can see QoS events that they triggered or that target files they own.

Thanks! Your plan sounds good.

Cheers, Natalie & Onno

paulmillar commented 4 years ago

Are the inotify IN_ATTRIB events at all useful for you for monitoring QoS transitions?

Using them would require more work, but it does have the advantage of being available "right now".

onnozweers commented 4 years ago

I think the IN_ATTRIB way would be too cumbersome. Then instead, polling the locality of a file list will be easier and less work. If many users do polling, there may be a load on the system, but we'll see when we get there.

paulmillar commented 4 years ago

Should I close this issue for now, or leave it open as inspiration for future QoS events?

onnozweers commented 4 years ago

I think I'd leave it open to keep track of it.