PandA dataset naming - Githubissues

coretl commented 4 months ago

At the moment the PandA HDF writer will always write datasets like "INENC1.VAL.Mean", "COUNTER1.OUT.Diff". These are exposed in the descriptor as "panda1-inenc-1-val-Mean" and "panda1-counter-1-out-Diff". These don't make any scientific sense, so we should allow scientific names to be given to the datasets and set the ophyd name accordingly.

The options:

Pass in HDFPanda(prefix, dp, dataset_names = {"inenc-1-val": "sample-x", "counter-1-out": "izero"})
Post init but pre-connect we get the blocks, but don't know how many there are, and things like "INENC" are difficult as they are target specific
Post init and post connect we get connected blocks and can do panda1.set_datsets({panda1.inenc[1].val: "sample-x"}) in the plan

coretl commented 4 months ago

Decision is:

Keep type hinting, and have target specific flavours of HDF PandA
Drop type hinting, and keep one HDFPandA

coretl commented 4 months ago

@callumforrester has my favourite suggestion so far:

Make HDFPanda.dataset_names: SignalRW[dict[str, str]], e.g. {"inenc-1-val": "sample-x", "counter-1-out": "izero"}
Keep the ophyd names as they are
Load/save the dataset names with the other attributes
At DetectorWriter.open then for each signal_name in dataset_names.keys():
- Grab the value of signal_name + "_capture"
- If it is "No" then ignore it
- If it has a single value then give it the scientific name
- If it is "Min Max" then expose scientific_name + "-min" and scientific_name + "-max"
- If it is "Min Max Mean" then expose scientific_name + "-min", scientific_name + "-max" and scientific_name
We don't ever change the name of the signal, so bps.rd will still have panda1-inenc-1-val, this is because we would still have to bps.rd(panda1.enc[1].val) so would be surprising if it had a different name, plus it's difficult to get a soft signal to change the names of things outside itself without making spaghetti

evalott100 commented 4 months ago

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

callumforrester commented 4 months ago

The current save/load functions don't need any protocols, they just walk the device and set signals. It's only when we need some intelligence in a device (i.e. it knows the special way to save/load itself) that we need a protocol. Is there a use case for that here?

evalott100 commented 4 months ago

Hmmm true. I supposes I could do Device::load_device and Device::save_device

coretl commented 4 months ago

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

Please keep these as plan stubs, we considered new protocols, but we decided to add as few new protocols as possible. Either saving or loading will require application specific logic which is better placed in a plan

evalott100 commented 4 months ago

I'll be moving the current plan stubs from core

evalott100 commented 4 months ago

Yesterday @coretl and I agreed that the dataset names should be set through the IOC. This turns this into three Issues...

pandablocks-client: Enable passing in of alternative names for devices when writing.
pandablock-ioc: For each capture record, enable giving a scientifically relevant name which given to the client for writing. These names will be set through a new PV (e.g COUNTER1:VAL:DATASET).
ophyd-async: Rather than obtaining records marked for capture by looking for names with _capture on the end, just look for which capture records have a scientifically important name set.

Then down the line we can...

pandablocks-ioc: Create a new PVI table structure consisting the all positions capture table.
ophyd-async: Use the above instead of walking.

evalott100 commented 3 months ago

@callumforrester Instead of searching through device children for things ending in _capture,

https://github.com/bluesky/ophyd-async/blob/0b05b9c0c7c25417cb76432de0eae067fc7a86ce/src/ophyd_async/panda/writers/_hdf_writer.py#L38-L54

we can just look for devices with a DATASET field in their children:

            string rw PANDA1:COUNTER1:OUT:DATASET # will have a `dataset: SignalRW(str) attr

Additionally, putting to this will give it a different scientific name in the file output

callumforrester commented 3 months ago

@evalott100 so to clarify: There will be a new PV PANDA1:COUNTER1:OUT:DATASET, and ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset" and making a mapping of the dataset name (the value of the PV) to those signals' parents?

evalott100 commented 3 months ago

ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset"

Yep, once using https://github.com/PandABlocks/PandABlocks-client/pull/91 and https://github.com/PandABlocks/PandABlocks-ioc/pull/118

callumforrester commented 3 months ago

Based on discussion with @abbiemery @evalott100 and @coretl, we are going to slightly change the plan and push more logic down to the IOC. The previous plan has ophyd-async putting dataset names to the panda and then reconstructing a data structure for .describe() from the various signals. It is simpler to build that data structure once, in the IOC and export it via a PV. Especially because the relationships between datasets and their min/max datasets are not actually important to ophyd-async, it just needs to know what datasets the panda is writing to HDF5 (and their data types).

So the new plan:

pandablocks-client: Enable passing in of alternative names for devices when writing, no change from previous plan.
pandablock-ioc: For each capture record, still enable giving a scientifically relevant name which given to the client for writing. These names will be set through a new PV (e.g COUNTER1:VAL:DATASET). Also create a top-level PV called DATA:DATASETS that exports a table of dataset names and HDF5 datatypes.
ophyd-async: No longer needs much complex logic, just reads the table out of the DATASETS PV and converts to a dictionary in the describe document.

Most of the complex logic now goes in the IOC, it needs to produce the DATASETS table and keep it up-to-date. The table will probably look something like this:

dataset	hdf5_type
sample_x	uint
sample_y	float
sample_z_min	float
sample_z_max	float
sample_x	float

In this case, sample_x, sample_y and sample_z are labels set via the DATASET PV on particular signals. The sample_z signal's capture PV is Max Min Mean, so all three datasets are included. Other datasets are other capture signals. If the capture signal for a dataset is No, it should not be included in the table at all. The table only shows datasets that will be captured if the panda is armed.

The table also has to be kept up-to-date live, i.e. if a new value is put to a CAPTURE or a DATASET PV, the DATASETS PV should update and produce a new table. This requires a bit of extra logic in the IOC. The panda can report datatypes that map to float and uint, exactly how and what the mapping is I'll leave to @coretl

coretl commented 3 months ago

The panda can report datatypes that map to float and uint

I suggest we use numpy datatypes, so float64 and uint32.

exactly how and what the mapping is I'll leave to @coretl Looking at the code, timestamps are all now floating point, so the logic is:

ext_out bits is uint32

Everything else is float64

bluesky / ophyd-async

PandA dataset naming #330