bluesky / ophyd-async

Hardware abstraction for bluesky written using asyncio
https://blueskyproject.io/ophyd-async
BSD 3-Clause "New" or "Revised" License
10 stars 23 forks source link

PandA dataset naming #330

Closed coretl closed 3 months ago

coretl commented 4 months ago

At the moment the PandA HDF writer will always write datasets like "INENC1.VAL.Mean", "COUNTER1.OUT.Diff". These are exposed in the descriptor as "panda1-inenc-1-val-Mean" and "panda1-counter-1-out-Diff". These don't make any scientific sense, so we should allow scientific names to be given to the datasets and set the ophyd name accordingly.

The options:

  1. Pass in HDFPanda(prefix, dp, dataset_names = {"inenc-1-val": "sample-x", "counter-1-out": "izero"})
  2. Post init but pre-connect we get the blocks, but don't know how many there are, and things like "INENC" are difficult as they are target specific
  3. Post init and post connect we get connected blocks and can do panda1.set_datsets({panda1.inenc[1].val: "sample-x"}) in the plan
coretl commented 4 months ago

Decision is:

coretl commented 4 months ago

@callumforrester has my favourite suggestion so far:

evalott100 commented 4 months ago

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

callumforrester commented 4 months ago

The current save/load functions don't need any protocols, they just walk the device and set signals. It's only when we need some intelligence in a device (i.e. it knows the special way to save/load itself) that we need a protocol. Is there a use case for that here?

evalott100 commented 4 months ago

Hmmm true. I supposes I could do Device::load_device and Device::save_device

coretl commented 4 months ago

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

Please keep these as plan stubs, we considered new protocols, but we decided to add as few new protocols as possible. Either saving or loading will require application specific logic which is better placed in a plan

evalott100 commented 4 months ago

I'll be moving the current plan stubs from core

evalott100 commented 4 months ago

Yesterday @coretl and I agreed that the dataset names should be set through the IOC. This turns this into three Issues...

Then down the line we can...

evalott100 commented 3 months ago

@callumforrester Instead of searching through device children for things ending in _capture,

https://github.com/bluesky/ophyd-async/blob/0b05b9c0c7c25417cb76432de0eae067fc7a86ce/src/ophyd_async/panda/writers/_hdf_writer.py#L38-L54

we can just look for devices with a DATASET field in their children:

            string rw PANDA1:COUNTER1:OUT:DATASET # will have a `dataset: SignalRW(str) attr

Additionally, putting to this will give it a different scientific name in the file output

callumforrester commented 3 months ago

@evalott100 so to clarify: There will be a new PV PANDA1:COUNTER1:OUT:DATASET, and ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset" and making a mapping of the dataset name (the value of the PV) to those signals' parents?

evalott100 commented 3 months ago

ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset"

Yep, once using https://github.com/PandABlocks/PandABlocks-client/pull/91 and https://github.com/PandABlocks/PandABlocks-ioc/pull/118

callumforrester commented 3 months ago

Based on discussion with @abbiemery @evalott100 and @coretl, we are going to slightly change the plan and push more logic down to the IOC. The previous plan has ophyd-async putting dataset names to the panda and then reconstructing a data structure for .describe() from the various signals. It is simpler to build that data structure once, in the IOC and export it via a PV. Especially because the relationships between datasets and their min/max datasets are not actually important to ophyd-async, it just needs to know what datasets the panda is writing to HDF5 (and their data types).

So the new plan:

Most of the complex logic now goes in the IOC, it needs to produce the DATASETS table and keep it up-to-date. The table will probably look something like this:

dataset hdf5_type
sample_x uint
sample_y float
sample_z_min float
sample_z_max float
sample_x float

In this case, sample_x, sample_y and sample_z are labels set via the DATASET PV on particular signals. The sample_z signal's capture PV is Max Min Mean, so all three datasets are included. Other datasets are other capture signals. If the capture signal for a dataset is No, it should not be included in the table at all. The table only shows datasets that will be captured if the panda is armed.

The table also has to be kept up-to-date live, i.e. if a new value is put to a CAPTURE or a DATASET PV, the DATASETS PV should update and produce a new table. This requires a bit of extra logic in the IOC. The panda can report datatypes that map to float and uint, exactly how and what the mapping is I'll leave to @coretl

coretl commented 3 months ago

The panda can report datatypes that map to float and uint

I suggest we use numpy datatypes, so float64 and uint32.

exactly how and what the mapping is I'll leave to @coretl Looking at the code, timestamps are all now floating point, so the logic is:

  • ext_out bits is uint32
  • Everything else is float64