DiamondLightSource / tickit

Event-based hardware simulation framework
Apache License 2.0
7 stars 0 forks source link

Investigate generating simulated data #194

Open callumforrester opened 1 year ago

callumforrester commented 1 year ago

Making an issue to document our various thoughts on this...

Current State

The eiger detector in tickit-devices includes a single frame taken by a real eiger which it just repeatedly spits out. The frame has been pre-compressed so the simulation doesn't even need to understand bslz4.

Use Cases

@DominicOram to comment...

Fixed Output

Simulated detectors should be able to output a series of predetermined frames that look like real data (and probably are real data originally) and can be piped into the same analysis pipelines that take real data as a form of end-to-end system validation. The emphasis here is on performance, since a tickit detector is probably already slower than the real thing and we don't want to slow it down further.

Custom Output

There should be a facility to customize the detector data. At the expense of speed we may wish to output random data to test the system further, or vary the data quality for testing with an adaptive scan.

N.b. tickit is not a physics simulator. Its job is not to do the maths that shapes the beam or works out how it is scattered by a sample etc. There are technique-specific packages for this such as geant4.

Design Ideas

The simplest possible design is to include a facility for generating data inside each detector, possibly following a protocol or ABC for interoperability. You can potentially change/compose different data sources/generation methods in the config.

We could also take it out into the tickit graph, i.e. making separate "devices" to produce data and wiring them to detectors in many composable ways. The below examples show various possible levels of granularity.

image

Unsure if the design would require any framework changes. @abbiemery to comment...

Data Sources

Below are some potential ideas, they may or may not be good ones...

Existing Data File

A detector could stream data out of an HDF5 file, probably captured by the real thing at some point.

Data Simulation Framework

Sirepo is a synchrotron beam data simulation framework developed at NSLS-II. It already has integration with bluesky/ophyd, similar integration could be done with tickit. Alternatively it could be used to pre-generate data for the "Existing Data File" case.

Python Function

It would be nice to be able to write an arbitrary python function that returns a numpy array to generate frames as the ultimate level of customization

DominicOram commented 1 year ago

I'm going to strongly advocate for just the detector spitting out a fixed output based on an existing data file. It's very simple and it will give us a lot. The specific file it spits out should be runtime configurable so that we can have some tests with good data and some with bad.

callumforrester commented 1 year ago

Indeed, not saying we should support all of these, but I think we can avoid designing them out.