Rework imglib2 pixel types and a new tensor transform hierarchy.

tinevez commented 2 years ago

1. We do not depend on the RaiArrayUtils anymore. We use the imglib2 utility class RealTypeConverters to perform backend copy or conversion on the fly, without any assumption on how the backend is stored (can be an array, a list of arrays, a cell array, or something stored on the HD).

2. The pixel types we deal with are restricted to be RealType (no complex values in pixel, scalar values only) and native types (can be represented by a native type in Java: double, float, etc). This restriction allows us using more specific methods and classes in imglib2.

3. Fixed the type of Tensor created or returned by copy methods.

4. A simple hierarchy for tensor transformation.

This hierarchy defines an interface and a utility abstract class for tensor transformations. The tensor transformations will be instantiated with any hyperparameters they need (e.g. the threshold for a binarize transform).

It then defines several methods to apply this transformation to a tensor, either in place, with a specified output tensor, or creating a new output. Pixel type conversion on the fly is supported.

Implemented:

binarize
sigmoid
clip
normalize

Multithreaded when possible, but does not offer control on the multithreading.

carlosuc3m commented 2 years ago

Thanks a lot for this pull request @tinevez ! Great job. I only have a couple of concerns and both of them are related to the transformations. The setters were used to set the arguments for the transformation because the value for them is spedified in the model rdf.yaml file. In this rdf.yaml file the arguments are stored in a dictionary and we cannot make sure which of the aguments goes first, second or third. Thus using a setter with the name of the argument as specified in the rdf.yaml file ensures that we are setting the correct value to the correct argument.

For example for the pre-processing of the model specified in this rdf.yaml, we would need to have setMode, setAxes, setMinPercentile, setMaxPercentile which would ensure that all the arguments are used correctly. What do you think?

Also I was wondering about the input and output tensors of the transformation. In your example the apply method needs the input and output tensors as arguments. I was wondering if it makes sense to get rid of the output tensor and just modify the the input tensor to avoid replicating the data. Will this affect the display of the image the tensor has wrapped?

Also in the case we needed an input and output tensor, how each of them would be called? The name of the tensor is what it is used to feed the tensor to the model, so this might introduce some difficulties.

tinevez commented 2 years ago

Hello @carlosuc3m

My bad! I completely neglected the interface required to set input and output.

Can you draft an interface of the required setters and getters? Then I will try to integrate it in my proposal. I see you listed:

Common interface:
- setMode
- setAxes
Specific to some operations:
- percentile: setMinPercentile setMaxPercentile
- binarize: threshold

It would help me if you could explain how the yaml should be interpreted with arbitrary operations.

As for the input and output, the interface actually gives 3 methods the user can choose from:

specify only the input, a new output is created and written into.
specify both input and and output, results are written into the output
in place: results are written into the input.

I will update the hierarchy so that it is easier to write in place operations, that does not create an intermediary storage.

carlosuc3m commented 2 years ago

Hello @tinevez ! The only argument common for every transformation should be mode thus setMode should be the only method in the common interface. With respect to how the rdf.yaml is interpreted, using the following snippet as an example:

inputs:
- axes: byxzc
  data_range:
  - -.inf
  - .inf
  data_type: float32
  name: input
  preprocessing:
  - kwargs:
      axes: xyzc
      max_percentile: 100
      min_percentile: 0
      mode: per_sample
    name: scale_range
  shape:
  - 1
  - 256
  - 256
  - 8
  - 1

The input tensor is named input and is going to be modified using the pre-processing called scale_range. For this pre-processing there should exist a Java class called ScaleRangeTensorTransformation with setters per each of the kwargs listed, in this case setAxes, setMaxPercentile, setMinPercentile, and setMode. Once all the arguments are set we should be able to call the method apply on the corresponding tensor to execute the pre-processing correctly.

With respect the input and output, sorry I missed that!! Perfect then!!

carlosuc3m commented 2 years ago

Also @tinevez I forgot to mention that all the inputs to the transformations have to converted to float32 before applying the transformation. Do you think it makes sense to do that directly inside the apply method?

tinevez commented 2 years ago

Also @tinevez I forgot to mention that all the inputs to the transformations have to converted to float32 before applying the transformation. Do you think it makes sense to do that directly inside the apply method?

Hello @carlosuc3m I will try to address that. But I need some info from you.

We can decide that the output of all the transforms are going to be float32 regardless of the input. But then, what is our policy regarding in place processing? Do we only keep a signature like this:

 Tensor< FloatType > apply( final Tensor< R > input )

and always create a new tensor for each apply()?

carlosuc3m commented 2 years ago

@tinevez

We can decide that the output of all the transforms are going to be float32 regardless of the input. But then, what is our policy regarding in place processing? Do we only keep a signature like this:
Tensor< FloatType > apply( final Tensor< R > input )

I think that this makes sense, even though is less fancy I would say it is less prone to errors. We could leave it as it is now with:

< R extends RealType< R > & NativeType< R > > Tensor< R >

which would be fancier in my opinion but not really useful or precise but the final tensor is always going to be float32 because that is the data type that neural networks use. Another option would also be to retrict the input type and have:

Tensor< FloatType > apply( final Tensor< FloatType > input )

In my opinion the first one is the best but I don't have a hard preference. What do you think?

and always create a new tensor for each apply()?

With respect to this. I think that if the transformations that we do on the RandomAccessibleInterval might be reflected on a displayed image, it makes sense to create a copy to avoid distortion on it and to be tranparent to the user. However, creating a new tensor duplicates the memory needed, does not it? And as using Deep Learning methods on CPU is already very memory intensive maybe we should try to save some memory. Taking into account both issues, do you think it is possible to have both the possibility of modifying the tensor in place and of duplicating the tensor? For example:

Tensor< FloatType > apply( final Tensor< R > input, boolean inplace )

and

Tensor< FloatType > apply( final Tensor< R > input )
{
  boolean inplace = false;
....
}

If this is too complex I would prioirtize transparency for the user and go with the replication option because the memory size of the image is not so important compared to the memory of the model. From my experience it becomes difficult to run models on CPU on float32 images bigger than 3 million pixels, which is only about 12MB because the real problem comes from loading hundreds of megabytes of the model in memory. HAving another copy of the image open would not affect so much the loading of the model unless the images are much bigger.

tinevez commented 2 years ago

Hello @carlosuc3m

Could you have a look at this iteration? If it goes ok, the next step for me would be to

implement the percentile transform (please tell me what should be the implementation)
make a test drive with an actual yaml file and a test source image.

carlosuc3m commented 2 years ago

@tinevez Thanks a lot again for your work! Again the only concern that I have is that setters will be needed to define the arguments for the transformations. They cannot be parameters of the constructor because when we retrieve them from the yaml file, we will not know the correct order to feed them to the constructor.

With respect to the percentile transformation. It sets a maximum percentile and minimum perentile and normalizes the image between both. It can be thought as clipping a cumulative histogram then scaling the clipped values between 0 and 1.

Max percentile 90% means that the value which is bigger than the value of 90% of the pixels has to be the maximum value of the image. Same for min percentile but it has to be smaller. I hope this explanation helps.

Regarding the setters for the arguments. If you want I can finally approve this pull request so we can finally merge the code, then I will modify the architecture of the setters and finally you can review it. What do you think?

tinevez commented 2 years ago

Thanks! It is very helpful.

I still think we could specify everything in the constructor of the transforms, even the mode. This would have the advantage of making the transforms immutable objects.

Then what I would do is another set of classes that could deserialize the yaml file in a transform sequence. The YAML deserialize will be able to produce a list of transforms, initialized with the right parameters, then we could just have

sequence = deserialize( yaml_file );
output = sequence.process( input);

I would like to test whether this is feasible. Could you take a YAML file and annotate it manually (pen and paper then scan if you feel like it) so that I could understand what needs to be done, and I will propose a prototype.

carlosuc3m commented 2 years ago

@tinevez

Then what I would do is another set of classes that could deserialize the yaml file in a transform sequence. The YAML deserialize will be able to produce a list of transforms, initialized with the right parameters, then we could just have
sequence = deserialize( yaml_file );
output = sequence.process( input);

Something similar is already done, but it is done in the DeepIcy side. The yaml is read into a ModelDescriptor object and that is used to create the transformation. HOwever I like your idea of creating the transformation directly from the yaml object ad I think it should be implemented. Then there is the question of when should the whole yaml parsing classes lie. DeepIcy has its own classes on the DeepIcy repo but I think there was an effort to externalize that code: https://github.com/bioimage-io/core-bioimage-io-java/tree/master/src/main/java/io/bioimage/specification

I think that it makes sense to do what you suggest but then we have to decide whether to move all the yaml parser from DeepIcy to the common yaml parser or not. If we do it would mean recoding several parts on the DeepIcy plugin and if not it would mean duplication of functionalities.

I would like to test whether this is feasible. Could you take a YAML file and annotate it manually (pen and paper then scan if you feel like it) so that I could understand what needs to be done, and I will propose a prototype.

Yess, will do. In the mean time you can look at the official documentation if you want: https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/model_spec_latest.md

carlosuc3m commented 2 years ago

@tinevez Hello Jean-Yves, just one question related to this library but affecting the JEP/Python project. Is there a way to obtain a Java array from the random accessible interval? This would be necessary to create numpy arrays in python using JEP.

carlosuc3m commented 1 year ago

Hello @tinevez I am going to commit your changes into my branch so I can work with your improvements plus some changes I have made in this time (related to version management). As the yaml stuff is maybe not of the highest priority right now I will create another branch from the merged branch where I will finalize thefirst iteration of the code. Is it fine for you?

bioimage-io / JDLL

Rework imglib2 pixel types and a new tensor transform hierarchy. #3