bluesky / tiled

API to structured data
https://blueskyproject.io/tiled
BSD 3-Clause "New" or "Revised" License
51 stars 48 forks source link

RSS feeds #377

Open danielballan opened 1 year ago

danielballan commented 1 year ago

I wonder if it would be worthwhile to make it possible to view a node's contents as an RSS feed. For some nodes that would be an exotic use case, but for "scans at beamline XYZ" it might be useful to be able to pull it up in a feed reader app.

padraic-shafer commented 1 year ago

Oooh, that's a fun idea.

I've been kicking around ideas for notifying me and/or beamline users about the status of a beamline or experiment. RSS seems less intrusive than email or slack notifications, and even lighter weight than checking a web page to monitor the "health" of a beamline or experiment.

If this concept comes to fruition, I would probably also hook a BL alarms monitor into the same feed. I'm thinking a simple wrapper around a PV monitor with some predefined out-of-range condition (with a warning range, and an error range).

padraic-shafer commented 1 year ago

I suppose it could be a fairly universal/open pub-sub mechanism for kicking off data processing events when a (completed) new file arrives. But there is, of course, a distinction between "new file created" and "new file complete"--which might not generally be knowable.

danielballan commented 1 year ago

That sounds useful to me. RSS is polling-based, so it would be suitable for updates that are slow (minutes, not seconds) and not urgent. That fits probably some but not all use cases. Other technologies like server-sent events, websockets, WebSub are also interesting.

To you last point, about created vs. complete, I think for several reasons tiled will need to grow some concept of "complete / committed" to distinguish a dataset in the process of being written to a dataset that is "ready".

padraic-shafer commented 1 year ago

RSS is polling-based, so it would be suitable for updates that are slow (minutes, not seconds) and not urgent. That fits probably some but not all use cases.

Ah, ok. Thanks..that's good to keep in mind.

Other technologies like server-sent events, websockets, WebSub are also interesting.

I recently stumbled across WebSub. It looks like a reasonable fit to the use case I was describing, and quite widely used for push-based updates. One article suggests some drawbacks in terms of exposing private data over the feeds and the complexity of building an in-house hub...but I suspect they are sowing a bit of FUD to sell their services. Nevertheless, it appears some customization would be needed to add authorization and keep data access restricted, which seems to go against my motivation for using an open protocol publishing mechanism.

A while back, I made a demo for "real-time" publishing of beamline data to a toy data viewer. It subclasses the bluesky.callbacks.CallbackBase to POST new (or replayed) events to a URL, which in turn uses web sockets to broadcast the event data to all clients of a Plotly Dash app. That Dash app is mounted alongside a FastAPI interface that expects to be notified of new events or new runs.

Perhaps using an OpenAPI spec in this way is the right compromise for having an open, published interface for handing the push notifications? Websockets could have been used directly between the publisher and viewer, but I like having some interface for interchangeability and perhaps discoverability of the services. To that end, I would probably have needed to add (un)subscribe functionality to the data publisher as well; the idea here was to subscribe the ApiCallback to a bluesky RunEngine.

OK, I've moved too far from the original question about RSS, so I'll stop here. :)

padraic-shafer commented 1 year ago

To you last point, about created vs. complete, I think for several reasons tiled will need to grow some concept of "complete / committed" to distinguish a dataset in the process of being written to a dataset that is "ready".

That would be great. Is there already a thread on this topic?

I could imagine that for many cases (but certainly not all), the creation of a new file in the directory might signal that all previous files are complete. Or a simple timeout with no new file activity could indicate that the file is _probably_complete.

Otherwise, maybe there is a certain metadata field that tiled should look for in each file-like node to know the status? That would leave it to the creator of the adaptor to determine whether a file is in progress or committed/complete. Then tiled needs to decide if the default policy is to consider all nodes complete, unless the adaptor indicates otherwise (node marked as in progress); or the other way around.

danielballan commented 1 year ago

I know I had some discussions about "committed/complete" with @tacaswell and @dylanmcreynolds (separately) months ago but I think the idea was never captured in writing. Added here: https://github.com/bluesky/tiled/issues/386

danielballan commented 3 months ago

Related, an overview of various "real-time" client-server communication modes: https://rxdb.info/articles/websockets-sse-polling-webrtc-webtransport.html