java runner in Fiji - Githubissues

bioimage-io / JDLL

The Java library to run Deep Learning models

https://github.com/bioimage-io/JDLL/wiki

Apache License 2.0

27 stars 6 forks source link

java runner in Fiji #28

Open esgomezm opened 1 year ago

esgomezm commented 1 year ago

Hi!

This issue is to follow the discussion with @uschmidt83 in issue https://github.com/stardist/stardist/issues/68 and see if there's something that would be nice to consider for all the plugins.

I include @ivan-ea @carlosuc3m @lmoyasans and @cfusterbarcelo as they are working on the integration of the library in deepImageJ and how to deal with the dependencies inside Fiji.

The java library can deal with different versions of TF, PyTorch and ONNX. Our plan was to provide all of them in a specific folder inside Fiji.
The java library is getting integrated inside deepImageJ but most probably in the future, it will be given as a dependency in the Update Sites (similar to the TF manager).

Please, note we're trying to be a bit quick with this to update the paper status&rebuttal, but it would be nice if in a bit long term we could think about this carefully to make things clean and easier for all.

uschmidt83 commented 1 year ago

The java library can deal with different versions of TF, PyTorch and ONNX. Our plan was to provide all of them in a specific folder inside Fiji.

All of them? Wouldn't that be huge? I was wondering about that. I.e. what is a user supposed to do get this installed?

The java library is getting integrated inside deepImageJ but most probably in the future, it will be given as a dependency in the Update Sites (similar to the TF manager).

I thought that's how I'd use it from StarDist, i.e. add a dependency to my pom.xml.

Please, note we're trying to be a bit quick with this to update the paper status&rebuttal, but it would be nice if in a bit long term we could think about this carefully to make things clean and easier for all.

No rush from my side, for now I'm simply trying to understand what model-runner-java is and isn't going to do, how installation will work, and what I have to do to integrate this with StarDist. (For example, I will have to write tile prediction code, which thus far is managed by CSBDeep for us.)

carlosuc3m commented 1 year ago

Hello,

All of them? Wouldn't that be huge? I was wondering about that. I.e. what is a user supposed to do get this installed?

The core of the model runner can be used as a pom.xml dependency to your code with:

<dependency>
  <groupId>io.bioimage</groupId>
  <artifactId>dl-modelrunner</artifactId>
  <version>0.1.0</version>
</dependency>

(note that a new version with some improvements will be released in the upcoming days).

This dependency provides the core methods to communicate agnostically with any DL framework. However, the actual Java APIs for each of the DL frameworks (what we call the engines) have to be installed separately and "on demand". If you only need to run Tensorflow 2 models you only need to install the TF2 dependencies. The dependencies for each of the supported DL frameworks, depending on the API version and the OS are stored in a json file: https://github.com/bioimage-io/model-runner-java/blob/main/src/main/resources/availableDLVersions.json

For the case of stardist, if you only need tensorflow 2, you just need to download one of the TF2 engines that is complatible with the model, locate them in a folder called engines inside Fiji and then into a folder that follows the naming convention.

However, note that the JARs needed for Linux, Mac Intel and Mac Arm64 are different, so depending on the computer different jars will be required. For DeepImageJ we are going to download one engine per DL framework (the latest version) and per OS, meaning that tf 1.15, tf 2.7 (2.10 is not supported in Java 8) and Pytorch 1.13 for Mac, Win and Linux will all be downloaded independently of the OS. This is a quick solution in order to adopt the model runner fast in DeepmageJ. In the future there will be a better, cleaner and more robust solution for the engine management.

(For example, I will have to write tile prediction code, which thus far is managed by CSBDeep for us.)

The model runner does not do tiling at the moment, although it should do it in he future. However, you can still use CSBDeep to do tiling and then feed those tiles to the model runner (it uses ImgLib2 for the Tensors).

uschmidt83 commented 1 year ago

The core of the model runner can be used as a pom.xml dependency

👍

In the future there will be a better, cleaner and more robust solution for the engine management.

I think that would really be useful to help with adoption.

The model runner does not do tiling at the moment, although it should do it in he future.

Good to know.

By the way, can I choose to disable pre/post-processing when using the model runner? As this is necessary when predicting on tiles. Also, can I call the pre/post-processing manually instead?

However, you can still use CSBDeep to do tiling and then feed those tiles to the model runner (it uses ImgLib2 for the Tensors).

Yes, I could use/take parts of the CSBDeep plugin, or port my own Python tiling code to Java. Will have to see what's less work. Either way, it's going to take a substantial amount of work to migrate away from CSBDeep...

Is the model runner already mature enough that it is recommended to be used by other regular plugins?

Is there a place where you announce news about the model runner? Because I only knew about it by accident though @esgomezm.

carlosuc3m commented 1 year ago

By the way, can I choose to disable pre/post-processing when using the model runner? As this is necessary when predicting on tiles. Also, can I call the pre/post-processing manually instead?

Yes you can just use the model runner to run a model. Pre- and post-processing are optional. In addition, only the bioimage.io pre- and post-processings are available, so the stardist post-processing is not available.

Yes, I could use/take parts of the CSBDeep plugin, or port my own Python tiling code to Java. Will have to see what's less work. Either way, it's going to take a substantial amount of work to migrate away from CSBDeep...

If you need any help let me know!

Is the model runner already mature enough that it is recommended to be used by other regular plugins? Is there a place where you announce news about the model runner? Because I only knew about it by accident though @esgomezm.

The first stable version (0.2.0) is going to be released next week. It will be ready to be used by any software in any OS (deepImageJ and the plugin I am preparing for Icy will use it). And yess!! it will be announced on Twitter and on the image.sc forum!

carlosuc3m commented 9 months ago

HEllo @uschmidt83 I wanted to come back to this issue because finally JDLL is able to do tiling automatically. You just need to provide the size of the tiles and the model to run. The code would be something like the following:

String stardistModelPath = "/path/to/bioimageio/stardist/model";
Model model = Model.createBioimageioModel(stardistModelPath);
model.loadModel();
inputs = new ArrayList<Tensor<T>>();
RandomAccessibleInterval<T> rai = ...;
inputs.add(Tensor.build("input", "byxc", rai));

Map<String, int[]> tilingMap = new HashMap<String, int[]>();
tilingMap.put(inputs.get(0).getName(), new int[]{, 256, 256, 3});

List<Tensor<T>> outputs = model.runBioimageioModelOnImgLib2WithTiling(inputs, tilingMap);

I can help you if you encounter any problem. REgards, Carlos

uschmidt83 commented 9 months ago

Hi @carlosuc3m, thanks for letting me know about this. (Sorry for the late reply.)

You just need to provide the size of the tiles and the model to run.

In my experience, tiling isn't that simple and requires knowledge of the neural network architecture:

Tiles can only start at certain pixels (e.g. multiples of 8 for a U-Net with 3 down/up-sampling levels).
Tiles should overlap sufficiently (depending on the receptive field of the CNN).
How do handle boundaries (padding, etc.).

My goal was always to get the same output, whether tiled prediction is used or not. In my opinion, it's a technical implementation detail that a user ideally shouldn't have to care about, let alone know that it can affect the results (even if slightly).

Finally, we noticed a while ago that TensorFlow is also very slow when using a particular tile size for the very first time (not sure if this is still the case). Hence, we changed our code to always produce tiles of the same size to speed up inference.

How is tiling implemented in JDLL? What is your "philosophy" about it?

carlosuc3m commented 4 months ago

Hello @uschmidt83 . First of all sorry for the super late reply. I have now activated the notifications from the Gihub issues so I will be faster!

Currently, JDLL is strongly related to the Bioimage.io, thus tiling is handled following the requirements specified in the rdf.yaml Bioimage.io specs file.

It is still possible to define your own tiling conditions without an rdf.yaml if the model is not in the Bioimage.io format.

My goal was always to get the same output, whether tiled prediction is used or not. In my opinion, it's a technical implementation detail that a user ideally shouldn't have to care about, let alone know that it can affect the results (even if slightly).

Is this possible for every model?

How is tiling implemented in JDLL? What is your "philosophy" about it?

Tiling is currently implemented in the same way as in DeepImageJ. A minimum size and step are used for the input size. As you said, if the minimum size is 8 and the step is 16, only images with size 8 + 16 * n will be used. Also if the input has fixed input size, that size will always be the one used.

The output also depends on several parameters: the scale, the offset and the halo. The output tensor will always be of the size of the input_tensor * scale + 2 * offset, unless the output size is fixed.

Then the halo is used to remove pixels that might be affected by border artifacts. The padding is done by mirroring.

Here is more information about the parameters: https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/user_docs/model_descr_v0-4.md

uschmidt83 commented 4 months ago

It is still possible to define your own tiling conditions without an rdf.yaml if the model is not in the Bioimage.io format.

Good to know.

Is this possible for every model?

Should be possible for every model with limited receptive field, i.e. most CNNs.

Then the halo is used to remove pixels that might be affected by border artifacts. The padding is done by mirroring.

We only do padding in csbdeep/stardist if the image is not directly usable for the model (e.g. size must be divisible by 8). Here's an example image (from this notebook) of the tiling done in csbdeep (non-purple regions are overlap/context):

carlosuc3m commented 3 months ago

yes, it should be the same on JDLL. I will run some comparisions and get you back!