Closed sneakers-the-rat closed 2 weeks ago
OK does where i'm going with this make sense @t-sasatani @MarcelMB @phildong ?
Described it a bit in the slack, but for the sake of recordkeeping:
Here's a draft a structure for miniscope-io i think that once implemented should get us to v1. after we merge this i'll split out issues to that effect and make a project board.
The current structure is mostly ad-hoc, with the intention of doing a switch-up like this once we saw what we need. There are a few things that are undesirable about it:
miniscope-io
should not have an io
module, because the whole thing is io
, and so on.I'm just setting the models in place and trying not to touch anything else, so that I can start transitioning first the SDCard IO stuff and then the streamdaq stuff to it. Sorry for the bigass PR, but it's mostly boilerplate and the important part is the structure of it.
This is a combination of a few ideas, specifically here: https://github.com/Aharoni-Lab/miniscope-io/issues/39 re pipelining and reworking the config here: https://github.com/Aharoni-Lab/miniscope-io/issues/52
We can refactor all our current processing code as processing graphs with a Source
, some ProcessingNode
s, and a Sink
. For example, for the StreamDaq, we have an implicit processing pipeline like this:
flowchart TB
classDef Source fill:aliceblue;
classDef Sink fill:orange;
SerialSource@{shape: tri} --> uart_recv
OpalKellySource@{shape: tri} --> fpga_recv
uart_recv --> buffer_to_frame
fpga_recv --> buffer_to_frame
buffer_to_frame --> format_frame
format_frame --> StreamPlotter((StreamPlotter))
format_frame --> BufferedCSVWriter((BufferedCSVWriter))
format_frame --> OpenCVDisplay((OpenCVDisplay))
format_frame --> VideoWriter((VideoWriter))
fpga_recv --> BinarySink((BinarySink))
class SerialSource,OpalKellySource Source;
class BinarySink,StreamPlotter,BufferedCSVWriter,OpenCVDisplay,VideoWriter Sink;
What are devices but wrappers around processing graphs anyway?
So that's what i'm proposing here.
Software-edge contact with hardware devices should be Sources
and Sinks
that are generic across modes of communication - eg. we should have a UARTSource
and UARTSink
rather than a GRINiscopeSource
and a WireFreeSource
and so on. There is probably some conceptual smoothing to be had by making composite Source/Sink nodes, e.g. bidirectional things like UART and Serial are good examples, but let's handle that once we start implementing them.
Each node should implement a process
method that behaves like a pure function - it can use state from the instance attributes, but it knows nothing about the sources and sinks that are connected to it except their names and their parameterization. All the orchestration of starting/stopping them, passing input and output between the nodes is the responsibility of the Pipeline
container. This allows us to make super clean, reusable processing methods, but also will allow us to progressively rewrite them as C and Rust extension modules.
Each device has a pipeline that handles most of its heavy lifting. What defines a device is the template/config model that parameterizes a default pipeline, but we'll get to configuration in a sec. The Device class is mostly for providing a clear handle for end-users of the SDK, it provides a conceptual hierarchy (e.g. Miniscopes are a type of Camera) to make all the parameterization a bit easier to manage, convenience methods that are specific to that type of device (we don't want to be fishing around in deeply nested config models to tell what the fps is, we should be able to go camera.fps
), and otherwise any device-specific functionality that doesn't fit well into a source/sink/transform node like error handling, initialization/deinit, value/type checking, etc.
Each device and node has a companion configuration object, a pattern that is increasingly familiar in the streamdaq module. We will replace the awkward "Format/Model" system with one where we use yaml configs for all static configuration and models as serializers/deserializers and containers. This allows us to be super general with how devices work while also making them not devolve into soup:
For example, see the current structure of the streamdaq config: https://github.com/Aharoni-Lab/miniscope-io/blob/0455e57ca81d7bba67a2ea921a50ce860cac232f/miniscope_io/data/config/WLMS_v02_200px.yml
That would get written as something like this, with annotations within:
# Identifying information, all configs should have these
device: "OK"
id: "some-unique-identifier"
version: "0.0.1"
created_at: 1900-01-01
updated_at: 1901-01-01
# Some convenience parameters for the device can be defined at the root
# The Device class handles feeding them into the pipeline configuration
# e.g. if multiple nodes should receive the same value.
# bitstream file to upload to Opal Kelly board
bitstream: "USBInterface-8mhz-3v3-INVERSE.bit"
# Preamble for each data buffer.
preamble: 0x12345678
# Image format. StreamDaq will calculate buffer size, etc. based on these parameters
frame_width: 200
frame_height: 200
pix_depth: 8
# ------------------
# Most of the configuration is the configuration of a pipeline:
nodes:
opalkelly:
# All nodes have a type and can declare their inputs and outputs
type: okDev
outputs:
# each input/output matches to a key in the `nodes` mapping
- fpga_recv
# And then also accept parameters that define their operation
# e.g. we could also define bitstream here:
# bitstream: "USBInterface-8mhz-3v3-INVERSE.bit"
fpga_recv:
# a node can be named exactly what the type is named, and omit the type
# type: fpga_recv
inputs:
# explicitly specify an expected input rather than inferring
# to keep input/output implementation independent
- opalkelly
outputs:
- buffer_to_frame
# we can attach any sinks whose `input_type` matches our `output_type
- binary_sink
# A node can request parameters from its input classes,
# So e.g. if the okDev source was able to provide these
# directly from the bitfile or by querying the FPGA,
# then we wouldn't need to specify them here
# ---
# header_len: 384 # 12 * 32 (in bits)
# buffer_block_length: 10
# block_size: 512
# num_buffers: 32
# dummy_words: 10
binary_sink:
inputs:
- fpga_recv
# we should use jinja templates to be able to fill in values
# from the template and some handle consts like datetimes
filename: "{{ experiment_name }}/recording_{{ now.isoformat() }}.bin"
buffer_to_frame:
inputs:
- fpga_recv
outputs:
- format_frame
# we can write the metadata out from here rather than passing it through
# this will need an additional forking/merging system to declare
# multiple types of output/input slots, but we'll get there.
- csv_sink
reverse_header_bits: True
reverse_header_bytes: True
reverse_payload_bits: True
reverse_payload_bytes: True
adc_scale:
ref_voltage: 1.1
bitdepth: 8
battery_div_factor: 5
vin_div_factor: 11.3
format_frame:
inputs:
- buffer_to_frame
outputs:
- plot_sink
- video_sink
- ...
# ... etc you get the point
So the Device can set a default skeleton, we can then take that skeleton with a cli command like cli generate config wireless_miniscope ./my_config.yaml
and then customize. We can stick any sort of sinks we want in there, and if we want to hey we can customize the whole pipeline too until it's not recognizable as the device if we want to - the device wrappers are mostly there for convenience, and we could just rip and run with the node classes themselves if we wanted to. That means that we we develop different types of sources and sinks, it would take exactly zero code to be able to use them anywhere (in an ideal case, with a few rounds of finessing the pipelining system, famous last words.)
So you've seen the from_config
methods elsewhere already, the pipelines, nodes, and devices behave similarly, so those coupled config models let has a uniform system of parameters including defaults, full customization as people want it, through to accepting parameters as args at runtime. By giving into the coupled class/config model pattern, we can stop keeping all the relevant models scattered around everywhere and just put them in the same place as they're used.
Finally, this also sets us up nicely for a plugin system, but let's hold off on that for the future :)
To make review easier, here's a lil summary/walkthrough
docs/conf.py
- added a thing to do light mode/dark mode with mermaid diagrams, don't worry about thatdocs/
- all the rest except docs/reference/object_model
miniscope_io.devices
to miniscope_io.sources
, and made a new miniscope_io.devices
: this is the one place where i touched existing code because I needed to make stuff in devices
.pipeline
with the from_config
methods and the connect_ndoes
function, which is untested, but we also don't have any nodes yet so i can't write tests lolactually here what i'm gonna do is pull this into a pipeline
branch and then do all the conversion stuff on there, and then we can release that all at once instead of having a bunch of incomplete stuff hanging out in released code.
one more thing on this one before i merge it (to the pipeline
branch, not to main
) - i moved the bitfiles into miniscope_io/data/bitfiles
. I was going back and forth for awhile on whether we want to keep them with the opalkelly source class and having a specific directory for bitfiles, and here's how my thought process shook out
Reasons to keep accessory data files with the classes that use them:
Reasons to make a specific data
subdirectory for types of accessory files:
mio data ls
to show the types of data files that are packaged, and then e.g. something like mio data bitfiles ls
to list the available data files of that type. Otherwise we'd have to do this per-device class. Ditto for being able to provide programmatic access via get_data('bitfile/whatever.bit')
/list_data('bitfile/*'), and a
get_bitfile(device_name, frequency, pins, ...)` function to specify them semantically rather than by pathmy_bitfile.bit
, a relative path (from somewhere), or an absolute path (which would vary each time you installed mio in a different venv). merging this since it's a partial change to the pipeline
branch, not to main
, and is just a first draft of the models that will evolve during the transformation. going to get started on that now.
(Opening as a draft to take notes as I work, will mark not draft when it's ready for review)
At long last, we need to define some abc to deliver on the promise of "uniform interface for all miniscopes." To get there we need to do a bit of tidying in the rest of the package.
This PR just creates the abstract classes that will serve as the public API of miniscope IO camera classes, and does not handle converting the rest of the library to use them. Those we'll handle in future PRs.
Outline
We already have ...
Pipelines
So to make that formal and unify things a bit, we can think of a generic set of
PipelineModels
like this:PipelineModel
- abstract parent class that just indicates the children are in some way part of a pipelineNode
- abstract node within a pipelineSource
-Node
with output(s) but no input(s)Sink
-Node
with input(s) but no output(s)ProcessingNode
- aNode
with both input(s) and output(s)Pipeline
- A collection of connectedNode
sSo that would make a rough class diagram like this (where arrows indicate inheritance like
child --> parent
and diamonds indicate composition likecontained --* container
) - s.t. aPipeline
class is the orchestrator that contains a set of nodes, and the nodes don't directly call each other, but have an isolatedprocess
method that yields, consumes, or transforms some data (depending on node type) and thePipeline
class is responsible for orchestrating the nodes. This arrangement allows us to treat all the nodes as independent, run them in separate threads/processes/etc.Cameras
This design for a pipeline maps onto the existing classes s.t.
Devices
(which currently don't have a class hierarchy) like the OpalKelly device or an SD card areSource
s - they produce raw dataMiniscope
s (which also doesn't currently exist a a type) like StreamDaq or the SDCard miniscope arePipeline
s because they contain one or many data sources (e.g. the ephys miniscope which will have ephys input as well as a microscope input), zero or many intermediate processing stages (the methods within StreamDaq), and one or many data sinks (video output, binary output, live plotting)edit: Actually, rather than inheriting from
Pipeline
, it might be better for maintainability and expansion if we makeDevice
be a separate class tree that uses a pipeline via composition - i can see it being a big pain in the ass, it's not all that obvious semantically, and it's less flexible as far as the design of individualDevice
objects.So then we might have a class hierarchy like this, where we change the meaning of "Device" from being a reference to one of the data sources to being a full pipeline class like a
Camera
and its children. This lets us have separable/composable behavior like e.g. having an independentEphysDevice
that just has anEphysDaq
as itssource
, but then also have aWirelessEphysMiniscope
that has theEphysDaq
andStreamDaq
/OkDev
for combined ephys and microscope recordings.The
Control
class (for setting capture parameters on e.g. the wireless miniscope) could be aSink
that is not hooked up to the rest of the capture graph, so it just receives the input of someset_*()
method rather than receiving the input stream (which wouldn't make sense). This implies the existence of something we might call anInterface
class (open to different name) that has both aSource
andSink
node - aka it can read and write data to the same location, but the read/write operations are independent. E.g. it wouldn't make much sense to need two different objects to read and write to an SDCard, and even if they are separate modalities (via the FPGA vs. transmitting over IR) the ability to read from and control a streaming device are interdependent.(Not annotating composition relationships here because it makes the graph super messy, but hopefully it's obvious that the
WirelessMiniscope
would use anOkDev
, theSDCardMiniscope
would use theSDCard
, theBehaviorCam
could use either theUSBWebCam
source or aGStreamer
input, and so on.Implementation Details
Generic Types
It is formally not allowed to use
TypeVar
s inClassVar
s: https://peps.python.org/pep-0526/#class-and-instance-variable-annotationsThis is seen as a sort of pointless restriction, as it makes perfect sense from a type modeling POV to specify that an abstract class has some generic classvar type that is then made concrete by the concrete child classes. There is rough agreement in this
mypy
issue that this needs to be changed, but we are currently waiting on someone to push that through. Pydantic handles them fine, so we will be using them even if it might make type checkers complain: https://github.com/python/mypy/issues/5144