bioimage-io / core-bioimage-io-python

Python libraries for loading, running and packaging bioimage.io models
https://bioimage-io.github.io/core-bioimage-io-python/
MIT License
28 stars 21 forks source link

Unzip local bioimage model zoo models #255

Closed esgomezm closed 2 years ago

esgomezm commented 2 years ago

Hi!

1.- When I load a model from a zip file using the following code, is there any way from the bioimage.io library to unzip the model? Or should I hard code it?

# load model from path to the zipped model files
model_resource = bioimageio.core.load_resource_description(rdf_path)

I'm still working on the integration in ZeroCostDL4Mic notebooks and I realised that all model loading cases work well except this one in which I have a local BMZ model (.zip) (this will be the equivalent situation of sharing models privately).

2.- Is there any way of manipulating the information in config/deepImageJ/postprocessing? In this case the model would be of run_mode: deepImageJ.

constantinpape commented 2 years ago

1.- When I load a model from a zip file using the following code, is there any way from the bioimage.io library to unzip the model? Or should I hard code it?

# load model from path to the zipped model files
model_resource = bioimageio.core.load_resource_description(rdf_path)

Calling load_resource_description with a path to a zipped file works. Can you please try it and check the contents of the zip if it does not work? Maybe there's something wrong with it. If it doesn't work, but you think it should then please upload the zip somewhere so we can check it.

2.- Is there any way of manipulating the information in config/deepImageJ/postprocessing? In this case the model would be of run_mode: deepImageJ.

Sorry, I am not quite sure I understand: do you want to modify config:deepImageJ:postprocessing or do you want to set the run_mode field in the main spec to deepImageJ?

esgomezm commented 2 years ago

Calling load_resource_description with a path to a zipped file works. Can you please try it and check the contents of the zip if it does not work? Maybe there's something wrong with it. If it doesn't work, but you think it should then please upload the zip somewhere so we can check it.

It's not working for me. Actually I don't see the difference between load_resource_description and. load_resource_raw_description:

bioimageio_model_id = "/content/gdrive/MyDrive/.../B-subtilis semantic segmentation.bioimage.io.model/B-subtilis semantic segmentation.zip"
model_spec = load_resource_description(bioimageio_model_id)
model_spec
> Model(format_version='0.4.5', name='B-subtilis semantic segmentation', type='model', version=<marshmallow.missing>, root_path=PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea'), attachments=Attachments(files=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/per_sample_scale_range.ijm'), PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/Contours2InstanceSegmentation.ijm'), PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/training_evaluation.csv')], unknown={}), authors=[Author(name='Author 1 name', affiliation='Author affiliation', email=<marshmallow.missing>, github_user=<marshmallow.missing>, orcid=<marshmallow.missing>), Author(name=' Author 2 name', affiliation=' Author 2 affiliation', email=<marshmallow.missing>, github_user=<marshmallow.missing>, orcid=<marshmallow.missing>)], badges=<marshmallow.missing>, cite=[CiteEntry(text='Falk et al. Nature Methods 2019', doi='https://doi.org/10.1038/s41592-018-0261-2', url=<marshmallow.missing>), CiteEntry(text='Ronneberger et al. arXiv in 2015', doi='https://doi.org/10.1007/978-3-319-24574-4_28', url=<marshmallow.missing>), CiteEntry(text='Lucas von Chamier et al. biorXiv 2020', doi='https://doi.org/10.1101/2020.03.20.000133', url=<marshmallow.missing>)], config={'deepimagej': {'allow_tiling': True, 'model_keys': None, 'prediction': {'postprocess': [{'spec': None}], 'preprocess': [{'kwargs': 'per_sample_scale_range.ijm', 'spec': 'ij.IJ::runMacroFile'}]}, 'pyramidal_model': False, 'test_information': {'inputs': [{'name': 'test_input.npy', 'pixel_size': {'x': 1, 'y': 1, 'z': 1.0}, 'size': '256 x 256 x 1 x 1'}], 'memory_peak': None, 'outputs': [{'name': 'test_output.npy', 'size': '256 x 256 x 1 x 3', 'type': 'image'}], 'runtime': None}}}, covers=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/cover.png')], description='The model detects the background, the cell boundary and the inner part of the cell. For this, it uses a similar architecture to the 2D U-Net but with the number of output channels changed.', documentation=PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/README.md'), download_url=<marshmallow.missing>, git_repo=<marshmallow.missing>, id=<marshmallow.missing>, icon=<marshmallow.missing>, license='MIT', links=['deepimagej/deepimagej'], maintainers=<marshmallow.missing>, rdf_source=<marshmallow.missing>, source=<marshmallow.missing>, tags=['zerocostdl4mic', 'deepimagej', 'segmentation', 'tem', 'unet'], inputs=[InputTensor(name='input', data_type='uint8', axes=('b', 'x', 'y', 'c'), shape=[1, 256, 256, 1], preprocessing=[Preprocessing(name='scale_range', kwargs={'axes': 'xyc', 'max_percentile': 99.8, 'min_percentile': 1, 'mode': 'per_sample'})], description=<marshmallow.missing>, data_range=(0.0, 255.0))], outputs=[OutputTensor(name='output', data_type='float32', axes=('b', 'x', 'y', 'c'), shape=ImplicitOutputShape(reference_tensor='input', scale=[1.0, 1.0, 1.0, 3.0], offset=[0.0, 0.0, 0.0, 0.0]), halo=<marshmallow.missing>, postprocessing=<marshmallow.missing>, description=<marshmallow.missing>, data_range=(-inf, inf))], packaged_by=<marshmallow.missing>, parent=<marshmallow.missing>, run_mode=<marshmallow.missing>, sample_inputs=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/sample_input_0.tif')], sample_outputs=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/sample_output_0.tif')], test_inputs=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/test_input.npy')], test_outputs=[PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/test_output.npy')], timestamp=datetime.datetime(2022, 3, 29, 0, 2, 5, 611625), training_data=LinkedDataset(id='zero/dataset_u-net_2d_multilabel_deepbacs'), weights={'keras_hdf5': KerasHdf5WeightsEntry(authors=<marshmallow.missing>, attachments=<marshmallow.missing>, parent=<marshmallow.missing>, sha256='38039a20e0d8c6e0677aa9dd349736c81cd8025e01e9f50e48748ac94ce45161', source=PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/keras_weights.hdf5'), tensorflow_version=<Version('1.15.2')>, dependencies=<marshmallow.missing>), 'tensorflow_saved_model_bundle': TensorflowSavedModelBundleWeightsEntry(authors=<marshmallow.missing>, attachments=<marshmallow.missing>, parent=<marshmallow.missing>, sha256='ef5da1dca835193bacb27b03eba74aaea71d41fb33be82a913ed6709a6723844', source=PosixPath('/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/tf_weights.zip'), tensorflow_version=<Version('1.15.2')>, dependencies=<marshmallow.missing>)})

Sorry, I am not quite sure I understand: do you want to modify config:deepImageJ:postprocessing or do you want to set the run_mode field in the main spec to deepImageJ?

Both :) There's a model for which I would like to add an argmax postprocessing step (not yet defined in the BMZ as far as I know) if possible. If there's any way to change config:deepImageJ:postprocessing, I will do it like this. If not, I will create a model similar to StarDist and write about it in the documentation of the model.

constantinpape commented 2 years ago

It's not working for me. Actually I don't see the difference between load_resource_description and. load_resource_raw_description:

I can't reproduce this with some other model locally. Can you please upload this model somewhere so that we can test it?

Both :) There's a model for which I would like to add an argmax postprocessing step (not yet defined in the BMZ as far as I know) if possible.

I see. I will follow up on this.

FynnBe commented 2 years ago

It's not working for me.

I'm not sure what you are expecting, loading the model seems to have worked. e.g. the tf weights are at '/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/tf_weights.zip' (they happen to be a zip, but this .zip is not the bioimageio model zip you loaded.. At /tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea you should find the unzipped model package.

There's a model for which I would like to add ....

After manipulating a model you should be able to call https://github.com/bioimage-io/core-bioimage-io-python/blob/261ff923dde2fbd77da5a41af0ac5ffc465ba33b/bioimageio/core/resource_io/io_.py#L95-L118 afterwards to create a new model package. Note that this only works with the raw representation (what load_raw_resource_description returns). For fully loaded models this is not yet implemented.

constantinpape commented 2 years ago

I'm not sure what you are expecting, loading the model seems to have worked. e.g. the tf weights are at '/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/tf_weights.zip' (they happen to be a zip, but this .zip is not the bioimageio model zip you loaded.. At /tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea you should find the unzipped model package.

If I understood @esgomezm correctly the return value of load_resource_description and load_raw_resource_description is the same for this model. Since we only see the output from load_resource_description I am not sure if this is actually the case. In any case, I tried with a zipped model locally and I get different results for the 2 functions there. So in order to check this out we would need to have access to the same model @esgomezm is using.

@esgomezm regarding the run_mode: you can already set it via build_model, see https://github.com/bioimage-io/core-bioimage-io-python/blob/main/bioimageio/core/build_spec/build_model.py#L634

And regarding changing the deepimagej config: indeed that should be done as outlined by @FynnBe. To make this a bit more concrete, something like this should work (I haven't actually tested the code, so may need a few adjustments).

model_desc = bioimageio.core.load_raw_resource_description(model_path)
dij_conf = model_desc.config["deepImageJ"]
# update the config here
dij_conf["postprocessing"] = ...
model_desc.config["deepImageJ"] = dij_conf
# save the updated model  (choose a different output path to save it as a new model instead of over-writing the old one)
bioimageio.core.export_resouce_description(model_desc, output_path=model_path)
esgomezm commented 2 years ago

At /tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea you should find the unzipped model package.

This is what I need. The model description is loaded correctly in python, but I need to know where the keras model is located in the colab session, so I can load the weights in my architecture.

@esgomezm regarding the run_mode: you can already set it via build_model

Yes, this one I saw it. The question was more related to the postprocessing. I will try what you wrote and let's see.

Thank you!

constantinpape commented 2 years ago

This is what I need. The model description is loaded correctly in python, but I need to know where the keras model is located in the colab session, so I can load the weights in my architecture.

It's listed in the output you copied here:

model_resource.weigths["keras"].source
/tmp/bioimageio_cache/extracted_packages/2fcc80bdf0f6cdaeca332128cb568fde11e243a7d180d44312fb752e6db850ea/keras_weights.hdf5

Is there anything wrong with this?

esgomezm commented 2 years ago

Everything is good, I didn't see it and it was just confusing for me the difference between loading the description in one way or the other. Now everything should work well, I hope.

For the deepImageJ configuration, is there any way to include it when using build_model? Otherwise, I will leave it as an additional macro and then, the model will be fully compatible with the BMZ.

constantinpape commented 2 years ago

For the deepImageJ configuration, is there any way to include it when using build_model?

Currently not, but I could add an argument for it. What do you think would be the best way to do it? My idea would be to add a new argument overwrite_deepimagej_config, where you can pass a dict and this will then be used to over-write the generated deepImageJ config. E.g. overwrite_deepimagej_config={"postprocessing": ....} to over-write the post-processing config.

esgomezm commented 2 years ago

Hi @constantinpape I accept it would be nice but maybe it's better not to do it? xD Basically because then, such a feature should be available for other software. On the other hand, doing so, shouldn't it make sure that the run_mode is deepImageJ or other, as some of the standard features are modified?

constantinpape commented 2 years ago

I accept it would be nice but maybe it's better not to do it? xD Basically because then, such a feature should be available for other software.

Well, we have the add_deepimagej option already, so I think it's fair to also add another argument to over-write it.

In hindsight it would probably have been better to separate the functionality a bit and add a different function for getting the deepimagej config, so that it can just be passed to the config argument; but I wouldn't change this right now.

constantinpape commented 2 years ago

@esgomezm just fyi, we are working on a general purpose solution for this now. I will ping you as soon as this is ready to be used.

constantinpape commented 2 years ago

@esgomezm: ok, this is working already! You will need the latest version of bioimageio.spec (0.4.5post7). And then you can run the following code:

from pathlib import Path
import bioimageio.spec

rdf = Path("./esti/Fine-tuning of the pretrained 3D UNet/rdf.yaml")  # filepath to the rdf
inp = bioimageio.spec.load_raw_resource_description(rdf)

# get the current postprocessing config and append the extra item with your custom post-processing
pp = inp.config["deepimagej"]["prediction"]["postprocess"]
pp.append({"kwargs": "custom-macro.ijm", "spec": "ij.IJ::runMacroFile"}) 

update = {
    "config": {
        "deepimagej": {
            "prediction": {
                "postprocess": pp               
            }
        }
    } 
}

out = bioimageio.spec.commands.update_rdf(inp, update=update)
# this will over-write the rdf in place. If you don't want this, select a different 'path'
bioimageio.spec.io_.save_raw_resource_description(out, path=rdf)