CAREamics as a community partner

melisande-c commented 3 weeks ago

We would like to add CAREamics as a community partner to the bioimage model zoo!

About

CAREamics is a PyTorch library aimed at simplifying the use of Noise2Void and its many variants and cousins (CARE, Noise2Noise, N2V2, P(P)N2V, HDN, muSplit etc.).

Resources

CAREamics has functions to export and load from bio-image-zoo archive files, this allows users to easily package models to upload to the zoo.

Maintenance

CAREamics has a permanent engineering team and we will be committed to staying compatible with the bioimage model zoo.

Links

CAREamics organisation: https://github.com/CAREamics CAREamics source code: https://github.com/CAREamics/careamics

FynnBe commented 5 days ago

Let's get CAREamics onboard!

Here you can find details on the technical steps of how to become a community partner (I will help you complete these steps): https://github.com/bioimage-io/collection?tab=readme-ov-file#add-community-partner

[ ] add your info to the bioimageio_collection_config.json analog to this example
[ ] compatibility check script
[ ] compatibility check workflow

melisande-c commented 1 day ago

Hi @FynnBe, thanks for getting back to this, I will make the PR adding the CAREamics info to the json file shortly.

For the compatibility script, is there a way to test it is working as expected before we make a PR? I gather we just need to save a compatibility report file using the CompatibilityReport class, is there anything else we need to do?

FynnBe commented 1 day ago

No, that's pretty much it. You can also use the CompatibilityReport(TypedDict) as a typed dict in your code if you have issues with the dependencies of the collection_backoffice.

FynnBe commented 1 day ago

Small update: I have refactored the ilastik example to provide script_utils.check_tool_compatibility: https://github.com/bioimage-io/collection/blob/5327dac33314f6817f07e88c7b099110011d5831/scripts/script_utils.py#L48-L72

This simplifies the compatibility script needed from a partner, e.g. ilastik example, so now almost only an analog to https://github.com/bioimage-io/collection/blob/5327dac33314f6817f07e88c7b099110011d5831/scripts/check_compatibility_ilastik.py#L16-L25 is needed to implement the compatibility check.

Hope this makes things easier now and more maintainable in the future!

melisande-c commented 1 day ago

CAREamics will need to check that a CAREamics config.yaml is also included and able to instantiate our pydantic classes; to save me looking through source code, what is the best way to retrieve the url for this file?

Additional question: should we also check the model is loadable (i.e. has compatible architecture), or will downloading model weights be too costly/time consuming?

FynnBe commented 1 day ago

I suppose your models add this config.yaml using the attachments field then? And there is some additional information in the rdf.yaml under config.ceramics to indicate its presence?? Either way you probably want to go through a ModelDescr object (returned by bioimageio.spec.load_model_description(rdf_url)).

You should use bioimageio.spec to download the models (e.g. by simply loading them with model = bioimageio.spec.load_model_description(rdf_url). This way all files will be cached.

If you want to only deal with v0_5.ModelDescr you can simply check the model.format_version attribute.

melisande-c commented 1 day ago

I suppose your models add this config.yaml using the attachments field then? And there is some additional information in the rdf.yaml under config.ceramics to indicate its presence??

Yep it is added in attachments field, but there is no additional info in the rdf file.

You should use bioimageio.spec to download the models (e.g. by simply loading them with model = bioimageio.spec.load_model_description(rdf_url). This way all files will be cached.

So this means, in regards to my previous question, I will have access to the model weights and so I might as well check that the model architecture is compatible?

FynnBe commented 1 day ago

Yep it is added in attachments field, but there is no additional info in the rdf file.

hmm.. config.yaml is not a very unique name. This might lead to confusion. Maybe you could consider renaming this file (in the context of model descriptions) and/or referencing the careamics_config.yaml under config.careamics to know what file to look for (then you could name it arbitrarily). If these files are not hundreds of lines long you could also just insert it into the rdf.yaml at config.careamics.

So this means, in regards to my previous question, I will have access to the model weights and so I might as well check that the model architecture is compatible?

yes, you should in fact. Ideally even run one training iteration (not epoch) and an inference test. CI only has CPU, but the time limit is pretty generous and we could ensure not to test everything at once if this becomes a bottleneck.

melisande-c commented 1 day ago

Our config files can be ~60 lines long, we already have 3 models uploaded with a separate configuration file, I would rather not check for both cases so if we change how we export to bmz then I would like to update these existing models.

In the case we do not insert the CAREamics config into the rdf.yaml file, what extra info needs to be added under config.careamics? the file name is already included in the attachments section.

melisande-c commented 1 day ago

For developing the script, I would like to test locally, how can I get access to an example rdf_url? (from one of the uploaded CAREamics models).

FynnBe commented 1 day ago

Our config files can be ~60 lines long, we already have 3 models uploaded with a separate configuration file, I would rather not check for both cases so if we change how we export to bmz then I would like to update these existing models.

yeah, updating 3 models isn't a big deal 👍

In the case we do not insert the CAREamics config into the rdf.yaml file, what extra info needs to be added under config.careamics? the file name is already included in the attachments section.

short answer: nothing. long answer: nothing if you make the file name careamics specific. if you do not then you rely on no other tool ever attaching a config.yaml. In addition I find the config.yaml name confusing as we have the config field inside the rdf.yaml already... Therefore I suggest to go with some version of careamics_config.yaml. Then there is no need to specify that the ubiquitous config.yaml is a careamics config file under config.careamics.

FynnBe commented 1 day ago

For developing the script, I would like to test locally, how can I get access to an example rdf_url? (from one of the uploaded CAREamics models).

hmm.. there are a few options. first to mind: search for the model id in https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/all_versions.json

bioimage-io / collection