NERC-CEH / plankton_ml

A project for image processing and analysis pipelines for plankton sampling
GNU General Public License v3.0
0 stars 1 forks source link

Drop scivision #24

Closed jmarshrossney closed 3 months ago

jmarshrossney commented 3 months ago

SORRY - I thought I could open a PR on my fork and then change the base repository to after the fact. It turns out I can't (as far as I can see) and so I've had a whole conversation with myself over here on the original PR.


I am proposing to drop scivision as a dependency, but to keep an eye on how it develops in case we want to adopt their model-loading protocol in future.

Why?

Bluntly, there are no functional benefits to using scivision, for us, at this point in time.

As far as I can tell the only reason to keep using it is because we support what they are trying to do and want to inject momentum into the project rather than remove it. That is a good reason, but given that we are in the very early stages of a project and don't know quite the direction we want to take it, I think it's reasonable for us to make this decision down the line.

Another reason for waiting is that intake (which scivision uses heavily) is currently going through a full-scale rewrite, and the maintainers say (see docs)

All old “sources”, whether still working or not, should be considered deprecated.

My instinct is to wait and see how this plays out and if/when scivision gets updated. [EDIT: my worry is that it just won't.. see observation at the end of this comment ]

Finally, the single model in the scivision catalogue that we are interested in locks us into a very old version of Python (3.9).

What changes

I forked the cefas/turing model, updated it to Python 3.12 and stripped out all the stuff that either didn't work or that we didn't need.

So at first approximation the only changes in the code should be that instead of loading the pretrained model from the original repo via scivision.load_pretrained_model, we instead just pip install the fork.

I also found that scivision.load_dataset can be trivially replaced by intake.open_catalog (see comment)

metazool commented 3 months ago

I concur with the "remove scivision, for now" take, and here's the longer, philosophical essay version of why: thank you for the PR. I'm happy to merge, waiting to do this in sequence starting before the layout changes.

Regarding the sweeping changes to intake though, I thought the scivision codebase and by extension this one was already on the newer Take2. I've got mixed feelings about intake as well; it seems very powerful but almost too generic, in a way that you have to relearn how to configure it slightly every time you work with it? It seemed full of promise at the start as a alternative to STAC for data without a spatio-temporal component (and I'd not come across it other than its connection to scivision).

We do have spatio-temporal metadata available for this project now after #22 - and I honestly can't think of other projects in the image machine learning line of work here that don't- so it could be a lot healthier to look again at the STAC approach, try to keep it standards-oriented, and there are others within CEH whose expertise could be drawn on for an environmental sample STAC extension - some notes and links here https://github.com/NERC-CEH/plankton_ml/issues/4