Real size of the output (definition of scale and offset)

esgomezm commented 4 years ago

While trying to adapt DeepImageJ to the YAML file, we had some doubts about how to specify the shape of the output. Here I let you the questions we come up with:

Is there any specific definition for scale and offset when defining the shape of the output?
What is the reference for scale? Is it output_shape = input_shape * scale or input_shape = output_shape*scale?
For the offset, does it refer to difference between the shape of the input and the output? after or before the scaling? What is the main difference between halo and offset?

If we want to do tiling, we should consider both the size of input and output tiles, and the size of the final output after tiling. A good example could be doing super-resolution (the output is larger than the input) with an encoder-decoder. Is there any consensus about this? If not, I suggest the following:

Shape of the model's output: output_shape = input_shape - offset
Shape of the receptive field of the output (of the output tile): receptive_field_shape = output_shape - 2*halo
Shape of the resulting image: result_shape = (tamaño_imagen)*scale

oeway commented 4 years ago

@FynnBe and @constantinpape may give clarification for these question, but I started to think is there "smarter" way of doing the shape calculation.

For example, instead of defining scale, offset, halo, how about allow a math equation, say, output_shape: 8*(input_shape*2) + 31? Or even point to a function just like the preprocessing transforms?

constantinpape commented 4 years ago

@esgomezm

* What is the reference for `scale`? Is it `output_shape = input_shape * scale` or `input_shape = output_shape*scale`?

It is output_shape = input_shape * scale + offset. You can think about output_shape (y) being a linear function of the input_shape (x):

y = a * x + b

Then a = scale and b = offset.

* For the `offset`, does it refer to difference between the shape of the input and the output? after or before the scaling? What is the main difference between `halo` and `offset`?

For offset see above. halo is a parameter for prediction and specifies how much if the predicted tensor should be cropped in order to avoid border artifacts. E.g. if you have a network that returns images of size [256, 256] and you specify halo = [32, 32], this tells the application that runs prediction to crop away 32 pixels on each side; so the output will be [256 - 2 * 32, 256 - 2 * 32] = [192, 192].

I think this covers all the use cases you describe later in the issue.

@oeway

For example, instead of defining scale, offset, halo, how about allow a math equation, say,

scale and offset define a linear equation and the example you give corresponds to scale = 16 and offset = 31.

We can think about generalizing this to higher orders. Another option would be to allow specifying a function in the native language, e.g. python that computes the output shape from the input shape; this would be the most flexible solution.

oeway commented 4 years ago

Right, my point of using equation rather than the scale, offset, halo definition is more about readability. One wouldn't need to read the documentation to understand how to calculate the output shape from the input for example.

Exactly, I think we should definitely go for a function, since we already support reading model definition, transforms from a callable function. If framework like Keras, the model object already keep track of the input shape and output relation, providing an extra function in the model definition file would be very straightforward.

constantinpape commented 4 years ago

Right, my point of using equation rather than the scale, offset, halo definition is more about readability. One wouldn't need to read the documentation to understand how to calculate the output shape from the input for example.

I don't think it's a good idea to allow people to allow writing arbitrary equations, because then we need to implement a lot of parsing logic for this and there can always be corner cases. Instead we could generalize this and give a list of values that correspond to the order of the polynomial and replace scale and offset with this. I.e. we have something like scale_factors = [a0, a1, a2, ...] and

y = a0 + a1 * x + a2 * x^2 + ...

where y = output_shape, x = input_shape.

Also note that halo is more or less unrelated to this; it is just a hint for the consumer software how much to crop during prediction.

Exactly, I think we should definitely go for a function, since we already support reading model definition, transforms from a callable function. If framework like Keras, the model object already keep track of the input shape and output relation, providing an extra function in the model definition file would be very straightforward.

Yes, @FynnBe started implementing something like this for the pytorch side as well, but I am not quite sure what's the latest on this.

oeway commented 4 years ago

I don't think it's a good idea to allow people to allow writing arbitrary equations, because then we need to implement a lot of parsing logic for this and there can always be corner cases.

Well, I wouldn't parse the equation myself, but use eval(expression, context={...})-like implementation, they exist in Python, JS, and in Fiji/Java there are also different script engines. You can think of this is an inline function without the need of saving it into a separate python module or script file.

Using eval in general is a bad practice, however, here we already allow executing arbitrary code for the model definition, using eval here won't make any difference in terms of security.

Yes, @FynnBe started implementing something like this for the pytorch side as well, but I am not quite sure what's the latest on this.

Great!

constantinpape commented 4 years ago

Well, I wouldn't parse the equation myself, but use eval(expression, context={...})-like implementation, they exist in Python, JS, and in Fiji/Java there are also different script engines. You can think of this is an inline function without the need of saving it into a separate python module or script file.

Using eval in general is a bad practice, however, here we already allow executing arbitrary code for the model definition, using eval here won't make any difference in terms of security.

Then the syntax becomes language specific and one could even evoke other dependencies. I would much rather separate this more strictly, either it's possible to express this relation as a (potentially higher-order) polynomial or it needs to be defined in a separate function.

oeway commented 4 years ago

Then the syntax becomes language specific and one could even evoke other dependencies. I would much rather separate this more strictly, either it's possible to express this relation as a (potentially higher-order) polynomial or it needs to be defined in a separate function.

Yes, it will be language specific. Since we have a framework and language key to specify the context for the yaml file, it seems natural to write an inline script with the syntax of current language. In terms of the complexity, if we already allow custom function for computing output_shape, supporting inline function would require minimal effort.

constantinpape commented 4 years ago

Yes, it will be language specific. Since we have a framework and language key to specify the context for the yaml file, it seems natural to write an inline script with the syntax of current language

Yeah, I see that point. But on the other hand it might still be confusing to see different syntax in there for new users, e.g. when porting a config to another language.

In terms of the complexity, if we already allow custom function for computing output_shape, supporting inline function would require minimal effort.

Yep, if we go with a eval solution that is true.

esgomezm commented 4 years ago

You can think about output_shape (y) being a linear function of the input_shape (x):
y = a * x + b
Then a = scale and b = offset.

This is great. I assume then, that the offset can be either positive or negative. Could it be possible to have it in the main description?

The previous definition solves the problem with respect to the input and output of the network, but it does not determine what is the shape of the final output image, after tiling / patch reconstruction. Should we assume that final_image = scale*input_image?

I like the idea of having a function and avoid different interpretations of scale and offset parameters, but @constantinpape is also right. Especially if you want to prepare your model to be compatible with different software.

While it might be a biased opinion, from a didactic point of view, I like to have those parameters in the YAML file (as long as they are explicitly defined). Sometimes, they support user understanding about how the processing is happening.

constantinpape commented 4 years ago

I assume then, that the offset can be either positive or negative.

Yes it can be negative.

Could it be possible to have it in the main description?

Yes, we should definitely update this, it is unfortunately a bit out of date now.

I like the idea of having a function and avoid different interpretations of scale and offset parameters, but @constantinpape is also right. Especially if you want to prepare your model to be compatible with different software.

Yeah, I think there are some pro's and con's here. We will get the discussion on a next iteration of the format rolling soon and this will be a point to include.

While it might be a biased opinion, from a didactic point of view, I like to have those parameters in the YAML file (as long as they are explicitly defined). Sometimes, they support user understanding about how the processing is happening.

My preference would also be to have this in the YAML if possible, but have support for functions for use cases that are too complicated for this.

FynnBe commented 4 years ago

I summed up the current version in https://github.com/bioimage-io/configuration/issues/14#issuecomment-603742862

I also updated the configuration readme (https://github.com/bioimage-io/configuration/commit/bfd0ae359c3ba3fcce073bd6a800ce58ab00e726) without the dynamic parts though, as they are not that vetted yet and clearly still under discussion

FynnBe commented 3 years ago

please reopen if this needs further discussion

constantinpape commented 3 years ago

@esgomezm @FynnBe I reopened this, seeing that there is still some need for clarification about the output shape description.

First, I think we all agree for the case halo = [0] (I am assuming 1D everywhere to keep this readible). In that case output_shape = input_shape * scale + offset.

Now the question is what happens for halo = [32] (or some other non-zero value). So my understanding is that this does not change how we compute output_shape, because specifying the halo does not change what the model does internally.

Instead, it means that the consumer software should crop away 32 pixels from both side in the offset to have "valid" output. (So this is rather a hint for the consumer software.)

FynnBe commented 3 years ago

I think I remember some more of our previous discussion on this topic and agree with your statements. Following points I want to clarify:

the relative position of the output is concentric to the reference input. we do not allow for a 'shifted' output.
offset contains only even integers, thus the origin of the output lies at -offset/2 wrt to the input.*
- thus offset covers all valid convolutions/postprocess croppings that change the output shape (symmetricaly).
- (as @constantinpape stated: halo has no effect on output shape, but rather denotes the border region that may contain artefacts, etc.)

* maybe it is more intuitive to define the origin of the output at +offset/2 wrt to the input and output_shape = input_shape * scale - offset. Ah, both make sense in their own way. In tiktorch I refered to offset with shrinkage, so output_shape = input_shape * scale - 2 * shrinkage and output_origin is at shrinkage...

constantinpape commented 3 years ago

Fixed in #59.

esgomezm commented 3 years ago

Hi all,

While working on the definition of model's output shape and the resulting image shape we find some problems with the current formula: output_shape = input_shape * scale + 2 * offset. This is something that we can think about for the next version, but in the case of deepImageJ at least, some use-cases may not be properly defined in the plugin or not covered.

So, we want to make the difference between the output tensor of the model and the output image that we want to finally display. Here the problem comes, especially, when defining the tiling:

There could be a case in which the output tensor shape is different from the input tensor, but the input image and the output image have the same size. For example, when we do not use padding in the convolutions of the CNN. In that case, output_shape = input_shape * scale + 2 * offset refers to a tensor but not to the final image. And this offset is not a shift w.r.t. the input origin. In this case, the offset < 0.
What about if offset > 0?
There could be also the case, for example in super-resolution, for which scale affects both the output tensor and the output image in the same way.

The main problem we have is that sometimes this formula can be used with both the output tensor and the image, and others, not.

constantinpape commented 3 years ago

@esgomezm I don't quite understand what the difference between output_tensor and output_image is.

constantinpape commented 3 years ago

I had a call on this with @esgomezm and I will try to summarize my understanding of this issue:

first terminology: output_image = the complete output for a given input_image. output_tensor = the output for a given input_tensor = input to the network(!), e.g. corresponding to a tile to be predicted
the issue is the following: output_shape = input_shape * scale + 2 * offset tells us clearly what size the output_tensor will be for a given input tensor. BUT it's unclear if that should also apply for computing the size of the output_image from the input_image, for example given a network with valid convolutions, we will have a negative offset, so the shape will be smaller, the desired output_image (!) should still have the same size as the input image. But in other cases the output_image should have a different size, e.g. if the network downscales by a factor of 2.

I think we should discuss this in more detail in one of the next Friday meetings. For now we will leave everything as it is, we don't have any models yet where this is really an issue (but these will certainly come, because valid convolutions are a common use case)

FynnBe commented 3 years ago

This definitely needs a closer look... My current understanding:

I am surprised that you expect an output_image of the same size for valid convolutions. Why is it a problem that it is smaller?
To compensate for smaller output images padding is obviously the answer, which so far we leave to the consumer software (as the model only get's tiles and how/if they overlap is only known by the consumer software using the model...). The software thus should pad the edge tiles accordingly if it wants the output image to match the input image. (And let the inner tiles overlap with eachother)

esgomezm commented 3 years ago

Hi @FynnBe ,

I am surprised that you expect an output_image of the same size for valid convolutions. Why is it a problem that it is smaller?

It can be smaller, but usually, we try to provide an image of the same size so it doesn't look weird to the user. Also, making it of the same size, allows you to directly compare it with ground truth for example. It's not mandatory but in many cases, it's nice to have it like that. We can of course let this to the consumer and just take as the correct one, the smaller output_image. The same question could be done about halo. If we have halo>0, why should the output_image be of the same shape as the input_image?

To compensate for smaller output images padding is obviously the answer, which so far we leave to the consumer software (as the model only get's tiles and how/if they overlap is only known by the consumer software using the model...). The software thus should pad the edge tiles accordingly if it wants the output image to match the input image. (And let the inner tiles overlap with each other)

It's true that each software should take care of it but for that, it needs to know what should be the correct output_image shape. Also, another reason why maybe this should be defined in detail is the cross-compatibility feature: we are providing some test images for which sometimes, we need to make the tiling. Does cross-compatibility mean that all software should run the model and get the same (plus insignificant error) output? If so, then this should be kind of well defined. I agree that not the tiling, but at least the expected output_image size.

Can be any case in which there's offset or scale (not for channels) and the output_image is of the same size?

FynnBe commented 3 years ago

The same question could be done about halo. If we have halo>0, why should the output_image be of the same shape as the input_image?

while offset != 0 results in a difference in pixels, halo only indicates the width of edge effects that are included in the output image, but should preferably cut off...

so common use cases are:

output tensor/image has same size and a halo specifies what area may suffer from edge effects (no offset)
output tensor/image is smaller (specified by offset), but no halo is zero as all output is valid

It's true that each software should take care of it but for that, it needs to know what should be the correct output_image shape. Also, another reason why maybe this should be defined in detail is the cross-compatibility feature: we are providing some test images for which sometimes, we need to make the tiling. Does cross-compatibility mean that all software should run the model and get the same (plus insignificant error) output? If so, then this should be kind of well defined. I agree that not the tiling, but at least the expected output_image size.

output image (size) should definitely be the same across different consumers, but how to tile should be considered an implementation detail and with suitable offset/halo settings not influence the output by overlapping the tiles appropriately. With some concrete examples on the table we should define this more precicely and have tests for it as well 👍

Can be any case in which there's offset or scale (not for channels) and the output_image is of the same size?

I suppose you can construct it with scaling up a fixed input size and offsetting the size increase with the offset... but practially, no

esgomezm commented 3 years ago

while offset != 0 results in a difference in pixels, halo only indicates the width of edge effects that are included in the output image, but should preferably cut off...

Yes but the artifacts in the edges of the output_image are overcome by doing some mirroring for example. That's why I say that someone could think that those pixels should be discarded as well, making the output_image smaller. I wouldn't do it but I could understand if anyone does.

so common use cases are:

output tensor/image has same size and a halo specifies what area may suffer from edge effects (no offset)

output tensor/image is smaller (specified by offset), but no halo is zero as all output is valid

And scale can be whatever, no?

output image (size) should definitely be the same across different consumers, but how to tile should be considered an implementation detail and with suitable offset/halo settings not influence the output by overlapping the tiles appropriately. With some concrete examples on the table we should define this more precicely and have tests for it as well :+1:

Yes, I think that having some examples will be very helpful.

FynnBe commented 3 years ago

Yes but the artifacts in the edges of the output_image are overcome by doing some mirroring for example.

I would say the edge effects can be reduced, not overcome in a sense of eliminated. so cutting a halo or not is left to the consumer software. The consumer also determines how to pad additionally, (as we have not added padding in any way to the spec (yet)), while the model itself may of course also deploy padding....

And scale can be whatever, no?

in these common use-cases I assume scale is often 1, but sure it could be anything.

FynnBe commented 7 months ago

I think this has been adressed with the 0.5 release, but please reopen or better yet, create a new issue, if there is anything left unclear.

bioimage-io / spec-bioimage-io

Real size of the output (definition of scale and offset) #22