NERC-CEH / plankton_ml

A project for image processing and analysis pipelines for plankton sampling
GNU General Public License v3.0
0 stars 1 forks source link

Add the "decollage" process for raw microscope output to the package #22

Closed metazool closed 3 months ago

metazool commented 3 months ago

See #21 for the context for this and links to the original - moving a rough script from the internal project and refactoring it for use in a future pipeline, as yet unspecified.

To test

Run unit tests

export PYTHONPATH=.
py.test cyto_ml/tests/test_decollage.py

Run from the commandline (stopgap)

python cyto_ml/data/decollage.py fixtures/MicrobialMethane_MESO_Tank10_54.0143_-2.7770_04052023_1 test

The last argument there is an "experiment name" used to name the output files. This is a stop-gap set of changes, I didn't want to go any further as it's still not completely clear how the workflow fits together. #9

What this doesn't cover

One discovery here is there's a lot of metadata for individual images based on segmentation and shape analysis that happens onboard the FlowCam - a lot more detail than I thought we'd have access to.

Given we don't have a really clear use case for it, I haven't attempted to do anything with that here but I can see the output being usefully either dropped into the object store and picked up for use with dask via intake, or indexed in a lightweight database like sqlite/datasette

metazool commented 3 months ago

@Kzra your feedback particularly appreciated, if you're still generating new images on a regular basis then this should be directly useful to you now, and faster as we're not re-reading the large TIFF every time we extract a small window

If not, i guess the next step is to add a binary classifier and add a probably-junk flag and maybe a confidence metric to each output image, option of just not sending anything on to the cloud at this stage, and see if we can do that with the new object_store_api