Aharoni-Lab / miniscope-io

Data formatting, reading, and writing from miniscopes
https://miniscope-io.readthedocs.io
GNU Affero General Public License v3.0
6 stars 2 forks source link

Make pipeline graph skeleton #39

Open sneakers-the-rat opened 2 months ago

sneakers-the-rat commented 2 months ago

Following: https://github.com/Aharoni-Lab/miniscope-io/pull/35#discussion_r1719442457

OK so i'm not suggesting we write a whole pipelining framework within miniscope-io, but i just want to structure the notion of chained processing stages a bit.

Original comments, so they're in the same place:

what do you think about making a Pipeline class to structure this a bit? So each stage of this is a very short function that takes a strongly typed input and output (we can use numpydantic for this!). each phase specifies the format of its output, and each of these 'sink' classes like BufferedCSVWriter also declares what types it accepts, so then someone can supply a path and a writer from the set of available writers for each stage. The pipeline creates a multiprocessing.Pool and dispatches tasks s.t. each stage runs, deposits output in read-only/buffered shared memory and triggers a signal in the next stage's read method/whatever attached writers there are.

so that way we don't have to keep fussing about adding more params to propagate through from the top and also separate perf concerns to some degree from correctness concerns (or we multiply them, who knows).

eventually i think we should rewrite some of these perf-sensitive routines as rust or C extension modules, so as much as we can split them up into pure functions the better off we'll be down the line i think

we will want the notion of priority among sinks, and that would let us simplify the structure further: what is the next method in the Queue pipeline if not a very high priority sink?

"dynamic TODO"/notes for a morning jonny:

t-sasatani commented 2 months ago

Dear morning Jonny:

I added some relevant questions to https://github.com/Aharoni-Lab/miniscope-io/pull/35#discussion_r1719654897 if you can have a look sometime. As I wrote in other channels, the remaining must haves in my mind are displaying (plotting) metadata and callback handles in case you can consider where these should go during stripping. Also it might not be directly related, but I think some methods, like display stuff, complain if they're not on the main thread, so it'll be nice if what goes to the main thread can be controlled a bit when structuring.

I'll add these somewhere soon if you don't, so there's no pressure, but it'll be nice to know a good place to plug these in.

t-sasatani commented 2 months ago

Related note on data sharing between processes. https://github.com/Aharoni-Lab/miniscope-io/pull/35#discussion_r1720721214