bluesky / event-model

data model for event-based data collection and analysis
https://blueskyproject.io/event-model
BSD 3-Clause "New" or "Revised" License
13 stars 29 forks source link

Fix minor bug in event_model's rechunk_event_pages #297

Closed cjtitus closed 3 months ago

cjtitus commented 4 months ago

rechunk_event_pages has a bug if you try to re-chunk an EventPage that has an empty "filled" key, because the dictionary comprehension to create the new "filled" dictionaries uses keys from "data". This means that you can't rechunk the EventPages that are emitted by the RunEngine.

Description

I have just changed one word -- instead of "filled": {key: page["filled"][key][start:stop] for key in page["data"].keys()}, I have written "filled": {key: page["filled"][key][start:stop] for key in page["filled"].keys()}, which correctly leaves filled empty, if it was empty to begin with.

Motivation and Context

It is necessary to fix this bug in order to re-chunk EventPages that are currently too large to go into Kafka after long fly-scans.

How Has This Been Tested?

I have tested this by manually calling rechunk_event_pages on some generated EventPages, and also by running the built-in tests.

cjtitus commented 4 months ago

Ahh, I spoke too soon! It turns out that rechunk_event_pages still fails if you increase the chunk size. So, if you input a list of EventPages and want to increase the chunk size to decrease the number of pages, the constructor for EventPages also needs to have a non-empty "filled" key.

I need to fix this as well, so don't merge yet.

cjtitus commented 4 months ago

Ok, now I can chunk up or down. I wonder if some tests should be implemented for this...

cjtitus commented 4 months ago

Ok, formatting fixed, and I added parametrization to the rechunk_event_pages test so that it now tests documents that have filled: {}.

I have checked that this test fails on the master branch (and passes for this PR)