Closed metazool closed 3 months ago
@Kzra your feedback particularly appreciated, if you're still generating new images on a regular basis then this should be directly useful to you now, and faster as we're not re-reading the large TIFF every time we extract a small window
If not, i guess the next step is to add a binary classifier and add a probably-junk flag and maybe a confidence metric to each output image, option of just not sending anything on to the cloud at this stage, and see if we can do that with the new object_store_api
See #21 for the context for this and links to the original - moving a rough script from the internal project and refactoring it for use in a future pipeline, as yet unspecified.
To test
Run unit tests
Run from the commandline (stopgap)
The last argument there is an "experiment name" used to name the output files. This is a stop-gap set of changes, I didn't want to go any further as it's still not completely clear how the workflow fits together. #9
What this doesn't cover
One discovery here is there's a lot of metadata for individual images based on segmentation and shape analysis that happens onboard the FlowCam - a lot more detail than I thought we'd have access to.
Given we don't have a really clear use case for it, I haven't attempted to do anything with that here but I can see the output being usefully either dropped into the object store and picked up for use with dask via intake, or indexed in a lightweight database like sqlite/datasette