NSLS2 / pymca

PyMca Toolkit git repository
Other
0 stars 3 forks source link

Define what constitutes a DataSource and a Key for Bluesky data #11

Open padraic-shafer opened 2 months ago

padraic-shafer commented 2 months ago

PyMCA is rather flexible about what constitutes a DataSource, a sourceName, and a Key, as long as they can be used to retrieve the data that a users selects in the (x, y, m) table. Therefore it is up to us to define these in a way that is practical for our anticipated use cases.

I think that one helpful simplification we could make is to focus the user toward selecting data streams within a CatalogOfBlueskyRuns, rather than selecting data from arbitrary Tiled nodes. This is in keeping with the overall aim of exploring and visualizing data from one or more Bluesky runs.

It then seems natural to associate the catalog (e.g. “…/smi/raw”, “…/smi/sandbox”, etc.) with the name of the DataSource rather than the Key. The Key should certainly contain the name of the data stream (e.g., primary, baseline, dark images, etc.). There is then perhaps some ambiguity in whether each run UUID is considered a separate source (part of the name) or if instead each run is part of the Key within the same source catalog.

Having the run be part of the DataSource name rather than Key is closer to some of the existing file-based DataSources, where there is one DataSource per filename (or perhaps a list of file names supplied as the “name” of a single source). However this means a DataSource object is created for each run, which is extra overhead…although perhaps this is not significant resource drain in practice. If needed we could make these multiple data sources a lightweight __slots__-based object that delegates common functionality to another helper object.

For simplicity we should probably avoid associating the DataSource with more than one catalog — that is, not use a list of sourceNames. One reason is that when the QDispatcher fetches data for a new selection, it sends only the Key and the Selection to the DataSource object. The data sourceName or an index would need to be included with the Key to reconstruct which sourceName was active during the selection. OTOH, the list of sourceNames (and list of Keys?) used by some of the file-based DataSources might have been an optimization constructed to minimize the number of DataSource objects in memory(?).

padraic-shafer commented 2 months ago

@AbbyGi @hyperrealist @danielballan What opinions do you have on how we should organize the name(s) and key(s) of DataSource(s).