NERC-CEH / plankton_ml

A project for image processing and analysis pipelines for plankton sampling
GNU General Public License v3.0
0 stars 1 forks source link

Packaged way adding a detritus classifier to image processing #32

Open metazool opened 2 months ago

metazool commented 2 months ago

Workflow for generating a classifier: s3 image collection -> Extract and store embeddings -> Fit a clustering model -> save the resulting artifact for reuse in annotation workflow

Could be Luigi or this is an opportunity to try and get started with pyorderly, or is it an opportunity to test this walkthrough of DVC and work with CML

Outline:

metazool commented 1 month ago

Taking intake out involves changing a few places where intake_xarray.ImageSource was being used to load images for the scivision model but it looks worth doing, results will be much more readable

metazool commented 1 month ago

This is partly completed in #36 - simplest possible DVC pipeline that fits a Kmeans model for an image collection and saves it for reuse - with a web interface for exploring the contents of the different clusters to judge by eye which is primarily detritus

You can see there's still an open question about where the metadata goes. I thought about adding a tag right into the EXIF headers, or into the metadata that describes a lot of detail about each image's properties that the microscope exports. It depends what is most useful to the ongoing application! And also how this will be used - is the tagging an extra stage in a Luigi pipeline that's processing and uploading images to an object store, or is it a distinct pipeline that's indexing and analysing images once they've been uploaded?

So I've left it open for now - it needs another use case probably, like the phenocam images, show the wider picture

cc @albags @Kzra