bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
35 stars 46 forks source link

Update shape fixer and add example #819

Closed danielballan closed 2 months ago

danielballan commented 2 months ago

Databroker 1.x does not validate that the shape metadata in the descriptor is correct. Databroker 2.x (i.e. Tiled-backed Databroker) requires it to be correct.

Databroker provides a "shape fixer" CLI for retroactively fixing shape metadata. In main, this utility operates by directly updating the documents in MongoDB. In this PR, it is modified to operate by making a PATCH request through tiled. This has several advantages:

This PR also adds a demo script and updates the docker-compose.yml to support a test. Quoting the docstring:

# Generate data with wrong shape in a demo MongoDB and Tiled server.
TILED_SINGLE_USER_API_KEY podman-compose up
python examples/generate_data_with_wrong_shape.py

# Try to load the data in the Tiled Python client.
# It will fail because the shape metadata is wrong.
from tiled.client import from_uri
c = from_uri('http://localhost:8000', api_key='secret')
c['raw'].values().last()['primary']['data']['img'][:]  # ERROR!

# The server logs should show:
# databroker.mongo_normalized.BadShapeMetadata: For data key img shape (5, 7) does not match expected shape (1, 11, 3).

# Run the shape-fixer CLI. Start with a dry run.
# The `--strict` mode ensures that errors are raised, not skipped.
databroker admin shape-fixer mongodb://localhost:27017/example_database --strict --dry-run
databroker admin shape-fixer mongodb://localhost:27017/example_database --strict

# The output should include something like this.
# (Of course, the uid(s) will be different.)
Edited 90b7ffa8-ba02-4163-a2aa-5f47d1eb322b primary: {'img': [1, 11, 3]} -> {'img': [5, 7]}
Migrating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

# Back in the Python client, try loading again.
# There is no need to reconnect; just run this line again:
c['raw'].values().last()['primary']['data']['img'][:]  # Now it works!
Kezzsim commented 2 months ago

This worked for me on a Debian laptop, with the caveat that I have to specify --handler 'NPY_SEQ = ophyd.sim:NumpySeqHandler'

danielballan commented 2 months ago

If ophyd.sim is installed in the environment where databroker admin shape-fixer ... is run, the NPY_SEQ handler will be discovered via entrypoints.