How to find a run uid given an image file name?

BCDA-APS / bdp_controls

APS-U Beam line Data Pipelines - experiment controls with EPICS and Bluesky

Other

0 stars 1 forks source link

How to find a run uid given an image file name? #20

Closed prjemian closed 2 years ago

prjemian commented 2 years ago

This is a high priority question for the BDP project.

tacaswell commented 2 years ago

You would have to build the inverse lookup table. We have talked about writing the code to do this, but never have.

prjemian commented 2 years ago

For now, it sounds like custom mongoquery might be the most efficient.

prjemian commented 2 years ago

An extremely easy and contemporary harvest is from the take_image() plan which captures the run's uid and learns the HDF5 file name from the resource: https://github.com/BCDA-APS/bdp_controls/blob/bef79d12d59bcbf78ccca1d8b8214d4c95181a70/qserver/instrument/plans/image_acquisition.py#L86-L98

This is the way we can capture this information for the future. Suggestions to save this info, from the bluesky developers on Slack, include:

redis key:value database (already included with queueserver installation)
new collection in mongodb server
local file

Of these, local TEXT file seems extremely easy.

prjemian commented 2 years ago

Text file could actually be structured, such as YAML, to make it fast to append new entries and easy to load in Python:

In [30]: import yaml

In [31]: s = """
    ...: a: 1
    ...: b: 2
    ...: """

In [33]: yaml.load(s, yaml.Loader)
Out[33]: {'a': 1, 'b': 2}

prjemian commented 2 years ago

Given an HDF5 file name /tmp/docker_ioc/iocbdpad/tmp/adsimdet/2022/03/29/a4700b27-2666-44cf-a86f_000.h5 from run uid=155d3536-f225-4c17-852a-6367792830f4, the entry would be:

a4700b27-2666-44cf-a86f_000: 155d3536-f225-4c17-852a-6367792830f4

We assume here that each HDF5 file will only appear in a single run uid. If we further assume that these identifiers are truly unique uuid codes, then we can record the swapped pair as well and allow for searches given either run uid or HDF5 file base name, find the other one:

a4700b27-2666-44cf-a86f_000: 155d3536-f225-4c17-852a-6367792830f4
155d3536-f225-4c17-852a-6367792830f4: a4700b27-2666-44cf-a86f_000

prjemian commented 2 years ago

If proceeding with a mongoquery, see see: https://docs.mongodb.com/manual/reference/operator/query/

prjemian commented 2 years ago

In [27]: dl = list(cat.v1[-1].documents())

In [28]: dl[2]
Out[28]: 
('resource',
 {'spec': 'AD_HDF5',
  'root': '/',
  'resource_path': 'tmp/docker_ioc/iocbdpad/tmp/adsimdet/2022/03/29/a4700b27-2666-44cf-a86f_000.h5',
  'resource_kwargs': {'frame_per_point': 1},
  'path_semantics': 'posix',
  'uid': '51d30cff-4580-4dda-a58a-2e05ea724886',
  'run_start': '155d3536-f225-4c17-852a-6367792830f4'})

In [29]: dl[3]
Out[29]: 
('datum',
 {'datum_id': '51d30cff-4580-4dda-a58a-2e05ea724886/0',
  'datum_kwargs': {'point_number': 0},
  'resource': '51d30cff-4580-4dda-a58a-2e05ea724886'})

prjemian commented 2 years ago

fill out the mongoquery search dictionary {} here:

In [51]: from apstools.utils import db_query

In [52]: db_query(cat, {})
Out[52]: bdp2022:
  args:
    asset_registry_db: mongodb://dbbluesky4.xray.aps.anl.gov:27017/bdp2022-bluesky
    metadatastore_db: mongodb://dbbluesky4.xray.aps.anl.gov:27017/bdp2022-bluesky
    name: bdp2022
  description: ''
  driver: databroker._drivers.mongo_normalized.BlueskyMongoCatalog
  metadata:
    catalog_dir: /home/beams/JEMIAN/.local/share/intake/

prjemian commented 2 years ago

example writing YAML file from take_image() plan:

(bdp2022) jemian@wow ~/.../bdp_controls/qserver $ tail -f xref_image_run.yml 
# file: xref_image_run.yml
# created: 2022-03-29 16:03:06.119378
# purpose: cross-reference bluesky run uid and HDF5 file name

00714a91-c33e-4e7b-90fd-2e8f385bebc9: add9e2d0-7f20-419d-a6a8_000
add9e2d0-7f20-419d-a6a8_000: 00714a91-c33e-4e7b-90fd-2e8f385bebc9
c96b08be-bf17-4623-9ee7-062effddbde9: 32b8278b-eded-42c1-85e2_000
32b8278b-eded-42c1-85e2_000: c96b08be-bf17-4623-9ee7-062effddbde9