Open k-dominik opened 1 year ago
Currently if we have some minimum shape in the rdf, it doesn't mean that the model will produce anything sensible - see e.g. ilastik/ilastik#2701 .
I agree that this is not optimal; and our initial goal of using the parametrized shapes for a dry-run never penned out because none of the consumer softwares have implemented it...
I think that's not very user-friendly, especially targeting users with little to no experience.
I agree. And the initial idea was also that the users with little experience would not need to care because the software automatically chooses a good tile shape via dry-run.
Since we have the test/example image, can't we enforce that minimum shape will do something usable on that image?
This is difficult. The numerical result with tiling will be different from the test result (which is produced without tiling), so it's not clear how one could write a generic test for this.
I'm not aware of other consumers doing processing tile-wise (maybe QuPath?!).
I think deepImageJ also always processes tile-wise, but to my knowledge it uses the shape of the test image to initialize the tile shape. This could also be a workaround you could use in ilastik for now.
Overall, I think the clean solution would be the dry-run functionality. Maybe it would be better to implement it in the core libraries (python and eventually also java), so that every consumer software could use it.
Overall, I think the clean solution would be the dry-run functionality. Maybe it would be better to implement it in the core libraries (python and eventually also java), so that every consumer software could use it.
Okay, still - say you have clogged your gpu memory for some reason and dry-run determines some small-ish shape that gives rubbish results - is that good? Or would it be better to have an actual minimum shape that gives "usable" results - such that if it doesn't fit to GPU one can switch to CPU?! I just think other than the model producer, there's basically noone we could get that information from.
- There is some hidden, preliminary support for per-dataset statistics in bioimagieo.core: https://github.com/bioimage-io/core-bioimage-io-python/blob/d435fcdb38c8b2152ac0d20f61ee498d88e7f1d0/bioimageio/core/prediction_pipeline/_stat_state.py#L14-L15
cool, is this already reflected in the spec somehow - are there models that use this?
after some discussion with @k-dominik I think we these issues:
and these poposed solutions:
We would still benefit from some form of dry-run to get better estimates for tiling, but that would mostly be a performance issue, not primarily a quality one. Having it in core makes sense. We might be able to reuse some lines from tiktorch.
What you outline sounds good @FynnBe.
(One minor comment: it would be nice to make model upload easier / more automatic, if this requires changing things in the model. But that shouldn't keep us from improving things ofc.)
...if this requires changing things in the model
the "per-sample"->"per-dataset" change could be overwritten in the collection only. no reupload needed -- it's just metadata.
... make model upload easier / more automatic
We (led by @oeway ) plan to improve model upload anyway though. With dedicated cloud space to upload a draft that can be validated directly and iterated over. Only when full validation passes we would publish to Zenodo. That's the rough plan.
This is a bit of a philosophical topic. And it is a little related to the use in ilastik - I'm not aware of other consumers doing processing tile-wise (maybe QuPath?!).
Currently if we have some minimum shape in the rdf, it doesn't mean that the model will produce anything sensible - see e.g. https://github.com/ilastik/ilastik/issues/2701 . The "minimum" shape we enforced in ilastik
(16, 64, 64)
is already 2 steps up from the minimum shape described in the model(8, 32, 32)
kind-seashell
. However the output is far from optimal even with this shape. I know one of the arguments for this in the past was that producing something on a small gpu might be better than nothing. I think that's not very user-friendly, especially targeting users with little to no experience. Since we have the test/example image, can't we enforce that minimum shape will do something usable on that image? The issue is probably caused by block-wise normalization. Implementing per-dataset statistics could be another workaround.cc: @akreshuk @constantinpape @FynnBe @oeway @esgomezm