bluesky / tiled

API to structured data
https://blueskyproject.io/tiled
BSD 3-Clause "New" or "Revised" License
52 stars 48 forks source link

Layouts for Bluesky data in Tiled #767

Open danielballan opened 5 days ago

danielballan commented 5 days ago

This was worked out through a conversation with @whs92.

Current Status

Here is an example that works today. We start a tiled server with a database (SQLite or Postgres) and a writable directory. For simplicity, here we use a single-user server.

$ tiled catalog init catalog.db
$ tiled catalog serve catalog.db -w data/ --api-key=secret

The experimental TiledWriter, created as part of the recently flyscanning effort, consumes Bluesky documents and makes API calls into Tiled. These calls can:

from bluesky import RunEngine
from ophyd.sim import det, motor
from bluesky.plans import count, scan
from bluesky.callbacks.tiled_writer import TiledWriter

RE = RunEngine()

from tiled.client import from_uri
client = from_uri('http://localhost:8000', api_key='secret')

tw = TiledWriter(client)
RE.subscribe(tw)

# Acquire data
RE(count([det]))

The metadata and data can now be accessed via curl or via that Python client object.

Design Goals

We want to represent metadata and data from Bluesky documents in Tiled structures (container, array, table) in a consistent, generic way for all Bluesky runs so that process is reversible. That is, we want to be able "replay" a semantically-equivalent document stream for the purposes of simulating what happened after the fact. This is useful for development and testing of streaming tools on old data.

This requirement unavoidably leads to a nested and rather "busy" structure that has to hold data, timestamps, and configuration for all the streams in the BlueskyRun. We end up with URL paths like:

/{uuid}/primary/data/I0
/{uuid}/primary/config/quadem1/quadem1_integration_time

(Nexus has the same problem: this is an unavoidable consequence of collecting and organizing a lot of context.)

In some contexts, we need to present (a subset of) this information in a flatter form. When navigating the data in a UI, it should not take more than one or two clicks to get to the data of interest. Likewise, it should be possible to quickly get to the data in an interactive IPython or Jupyter session.

We also want to be able to present the metadata and data in layouts that adhere to defined standards, such as Nexus application definitions.

Possible Approaches

Client-side

We could arrange the data and metadata in Tiled "the Bluesky way" and use client code (in Python, React) etc. to fetch the data of interest and "rearrange" it into the desired layout. This has a couple downsides:

Server-side

We could add to the Tiled server a concept of "views", where the metadata data are stored once but presented in a variety of layouts. It might look something like this:

/{uuid}/streams/primary/data/I0
/{uuid}/streams/primary/config/quadem1/quadem1_integration_time

# direct access to primary stream, which is what people want most of the time
/{uuid}/simple
/{uuid}/simple/I0

# Nexus application definition layout
/{uuid}/NxXAS/{...}

The TiledWriter would create /{uuid}/streams/, a consistent "ground truth" layout generated for all BlueskyRuns. Then, siblings like /{uuid}/simple/ and /{uuid}/NxXAS/ could be registered as "views". This could be done be a separate client or perhaps by extending/configuring TiledWriter.

I am wary of adding this concept to Tiled---something like a "view" or "alias" or "soft-link". It would have to be scoped very carefully, with implications for performance and access control taken into account from the start. But I am coming around to thinking that this is best way to address these use cases:

danielballan commented 5 days ago

I should add that @dylanmcreynolds introduced the suggestion of adding "views" to tiled in a lengthy PR discussion with @padraic-shafer and me, in March. All of us were generally favorable on it. We set it aside to focus on delivering a TiledWriter prototype. It's time to revisit and decide whether we want to move forward with that.

callumforrester commented 2 days ago

This looks interesting! A few comments: