Open danieltudosiu opened 1 year ago
Furthermore, the nomenclature used within the transformations is ambiguous and doesn't quite follow the interactive segmentation literature.
I would suggest the following changes:
Deepedit (which is derived from deepgrow) is based on https://arxiv.org/abs/1903.08205 is one kind. Actually if you see the interactions in Deepgrow it's little bit more than deepedit. We go over multiple clicks during eval mode for each iteration to compute the diff to compute new guidance. This is to feed user behavior when actually user clicks multiple places on the same region/label to cover the uncovered portion. and 2nd click will have a different priority than 1st.
And yes, some names can be confusing and can be corrected.
The objective of having deepedit is to show-case how you can train your own interaction models (or hybrid concept for auto segmentation + interaction) in monai/monailabel.
It is expected to be limited and all the code under monail/apps/deepedit is very much related to the paper and implements only one specific thing. another example is nuclick in pathology usecase. it is kind of similar concepts (different naming) but it has a different paper and some tailoring. so i will not assume we do have one thing that can support all possible options to train any interactive model.
however, this should not block anyone to implement their own interactions while training an interaction model of their kind. we should be only considering deepedit/deepgrow as one of the examples to show case how user can make use of interaction loops or in your way it can be special prepare_batch to add/update signals which can replicate some user behavior (in this case user clicks are converted to guidance)... again i can understand if anyone is confused between clicks/guidance/signal and they can be little bit standardized wrt naming (when to call what).
I see you have some good ideas on extending the interaction. I have been asking people to think about such new ways to train a model keeping human in loop. you are always welcome to add such new interactive models for others to make use of it. but i suggest considering it as another app/interactor/generic transforms and at the end show-case e2e workflow (for both train and infer) via tutorials or monailabel. i mean this need not be deepedit.. it can be something more than that.
Thanks a lot for the detailed answer. So am I to understand that MONAI Label is not going to support a wide array of common tools (like let's say MONAI Core for segmentation) but instead provide the frontend (interfaces) and some limited common functionality for people to build their own apps on top of?
At the moment I am working in Jorge's group taking over @diazandr3s work and enchanting it with better click ingestion and self-supervised learning based on @tangy5's work on Swin U-NETR. That's why I am creating those very specific deep-dive issues. In my view, the current implementation is heavily limiting due to the concatenation as well as assuming gaussian blur is the default way for creating the interaction map (which was shown by extensive ablations to be actually the worst).
Please feel free to close this issue if it is not within MONAI Label's preview to have a general-purpose backbone.
If I am to be bold and direct, I believe that currently, MONAI Label is slowly diverging from MONAI in two key ways:
I would be more than happy to try and dissect the current MONAI Label, propose a new design and work together with you to maybe get it ready for 0.7 or later.
At the moment MONAI Label is too much of a mashup of paper implementations with no common backbone (excluding the server-client-interface serving one).
To make it a parallel it feels like MONAI is moving towards a Linux-style ecosystem instead of an Apple-style where transitions between its subprojects are smooth and painless.
@SachidanandAlle @diazandr3s @tangy5 what are your opinions?
For models used in MONAI Label, people are integrating Bundles, the bundles are using Ignite or some standard design paradigms. But these are also case-by-case. And users can still design customized python scripts for specific purpose.
Researchers and clinician might want to contribute to MONAI core / Model-Zoo first before the model can be used for MONAI Label. And we are working on some design ideas to support general purpose API for this.
I feel the mission for MONAI Label is to built the ecosystem for both end-users and developers to quickly deploy AI models, we might want to showcase some design examples (radiology, pathology, endoscopy, etc), and provide easy-to-use APIs so that anyone can design their own app and use their own data. These examples are not only for end-users, but also provide design template for developers. Correct me if any :) @SachidanandAlle @diazandr3s
I feel it's better that MONAI Label not to be a common backbone, but common design idea or standards for accelerating medical AI computing/deploying, or it will overlap with MONAI Core. This might be a trade-off, the more MONAI Label towards end-users, the less robust for MONAI Label.
It's would be great to see more designs and ideas for MONAI Label, now MONAI Label is being used with broader scenarios and developers.
Thanks @danieltudosiu
I agree with what you are saying @tangy5 but I feel that currently the common design idea or standards
is missing from the design of the interactive segmentation which falls under the purview of MONAI Label more than MONAI Core.
And diverging from MONAI/Ignite paradigms might hinder the adoption of the MONAI Label, especially for people with little knowledge of programming/OOP since they might get used to how MONAI works and transitioning to an unstandardised MONAI Label that has a high degree of customizability (mostly due to lack of a common interaction segmentation API) would create another learning curve that is even steeper than MONAI Core due multiple elements:
While from your explanation MONAI Label is meant to reduce the barrier for labelling data I feel that at this point it is not particularly more useful than running inference and checking the outputs given a well-written codebase.
Its strengths lie in the interactive component for which a lot of work was poured into via the viewers, but it is not capitalised on in the backend due to a lack of basic building blocks from literature.
Hope it makes sense :-?
You are always welcome to improve some of the design/standards. At the end it's an open-source project. And there is enough scope to improve further wrt developer and user experience. But start with smaller steps :)
Note, all these 3 radiology, pathology and endoscopy are only examples to showcase how user can solve certain things (segmentation, classification, detection, interactive segmentation) via some concepts. And challengers in each of these areas are different. Being part of MONAI project, I am very much ok if we are asking users to follow MONAI/ignite paradigms. Strategically it makes me more sense as part one bigger team. In-fact thats the reason, MONAI Label is supporting model-zoo. And hopefully some day we don't define any tasks in MONAILabel and all comes directly from bundle. But aligning with other sub-projects doesn't mean a force. As I mentioned, if user wants to develop your own, basic interfaces are sufficiently defined for them to define their Infer/Training actions and they don't have to follow the simplified abstracts defined over monai core.
And documentation, yes. Things can be improved over there. In fact @tangy5 has contributed more during the last couple of months. Some docs for developer need to be simplified. Same thing on REST API (even though we have some customers developed their own viewers/plugins based on those REST APIs; but good to improve some docs to explain every request+attributes better)
If you find any of those gaps in docs, code or design.. feel free to pitch in. But let's have one change at a time (w.r.t PR as well).
And feel free to sync with @diazandr3s (as he falls in your timezone) and @tangy5 over TEAMS/SLACK if you have any questions. There is a monailabel slack group where some of the specific discussions can happen better.
Agree, we definitely need more docs, more details. More e2e tutorials are on there way, hope 0.6.0 will release these.
And yes, I guess it would be great to have an interactive model design guide or e2e tutorials using deepedit. This is what we can act immediately 😄.
Thanks so much for these feedbacks @danieltudosiu
@SachidanandAlle and @tangy5 thanks for the detailed answers. In regard to the small steps, I agree as PR should be atomic in nature.
I had a chat with @diazandr3s today and we were thinking that a clicking protocol class would be highly useful as a first step towards standardisation and ease of usage. This would also help decouple the Image Signal and Guidance Signal and would be the way to introduce dictionary key standardisation as well.
As a quick proposal I would suggest doing the following:
monailabel
called interactions
which aims at hosting everything regarding interaction processing. Proto-segmentation
- a segmentation, this can be obtained from the network or via unsupervised techniques such as super-pixels + random walkers.Interest Definition
- defining the area of interest for seeding with preprocessing based on things like connected components, uncertainty, edges, disagreementsSeeding
- simulation of the clicks, scribbles, bounding boxes (and their entailed augmentations) or fetching the info from viewersMapping
- processing the seeds via Gaussian Blur, Distance Map, Geodesic Map, etc or combinations of themThis would be implemented as a Monai/Ignite Handler. As a conceptual example, it would be combining the following components from the DeepEdit pipeline:
If I am to write a short pseudocode for DeepEdit it would look something like this (names are meant to be as suggestive as possible):
InteractionProtocol(
proto_segmentation=NetworkPrediction(),
interest_definition=Discrepancy(valid_slices_only=True),
seeding=ClickSeeding(radius=1),
mapping=GaussianMap(sigma=3)
)
As part of this, we would also standardise the keys and move away from the sequential interactions proposed by DeepEdit implementation. The keys would be the following:
class InteractionKeys:
PROTOSEGMENTATION = "proto_segmentation"
GUIDANCESEEDS = "guidance_seeds"
GUIDANCESIGNALs = "guidance_signals"
GUIDANCEMAPS = "guidance_maps"
The difference between guidance maps and guidance signals would be that guidance signals are guidance seeds into the image space while guidance maps are guidance signals processed by another technique.
This would be a very good first step towards a better more standardised MONAI LABEL. Please bare in mind this is just me outlining what I have in mind. Internally we are still to schedule a meeting between me, @diazandr3s and @MichelA to better define this process and its components and create a sketch of the classes and their methods.
@tangy5 @SachidanandAlle please tell me if this would be of interest for MONAI Label to move towards and slowly adapt and standardise itself.
BTW, as interactive models are getting more attentions.
Model-Zoo now has two bundles for interactive labeling:
where interactive signals can be defined within bundle:
Both are compatible with MONAI Label These might be better to understand how to keep MONAI/Ignite, standardise stuff on the bundle side, and MONAI Label maintains the usage of Model-Zoo bundle. If something need to change on the MONAI Label backend, it would better that the changes can be used for general purpose interactive models, instead of just for one specific model.
Great, as always said, MONAI Label is open-sourced, welcome all ideas with implementations! The idea of improving interactive models are good points, feel free to discuss designs and create a PR.
@tangy5 @SachidanandAlle should I consider the thumbs up as you agreeing with me to start refining and working on the suggestion's implementation?
We r looking for more contributors and contributions to improve the product.. you are always welcome..
Thanks, then I will keep you updated here on the progress and when I have an MVP I will initiate the PR and link it to this issue.
Great ideas! Separating click/seed sampling
from click representations
is also an important point for my ongoing research. I have circumvented the lack of separation by writing my custom transforms but it would be more productive for everyone to have a standardized and modular pipeline to design their own Interactive Model, maybe using DeepEdit as a template to begin with.
I was wondering out of curiosity, which research articles did extensive ablation studies show that Gaussian Heatmaps are the worst transforms to encode clicks @danieltudosiu ? I could only find RITM and this CVPR 2019 paper to make such comparisons and they indeed conclude that using Solid Disks encoding works better than Gaussian Heatmaps. However, I would not really consider these ablation studies extensive. I would be happy if you could point me toward the right literature for these comparisons.
Hi @Zrrr1997 thank you for your interest in this issue. I totally agree, and we at KCL have a project that it is in its infancy looking at clicking protocol and its robustness to users' interaction. The initial design was inspired by the DeepEdit code and currently, we realised that it still needs to be a Transform, not a Handler (even if design-wise is "wrong") due to the sample vs batch-wise processing.
I was referring to the 2019 CVPR paper that you referenced. While their ablation study is not extensive in choices of mappings, I believe it was at the time of writing. As far as I am aware there are only two other mapping strategies one is called click and drag which is basically a varying std gaussian blur and the other one is exponentiated geodesic distance-based mapping.
Furthermore, I consider gaussian and distance-map mappings as being worse due to the fact that your information leaks into other regions of the image. Due to the blurring or distance processing. This was circumvented in part in literature by passing prior segmentations or edge maps. Compared to disk is a very intuitive and robust way of interaction since the user sees what the system does compare with a gaussian which depending on the implementation can be misleading or a distance map which is even worse.
@Zrrr1997 I was wondering if would you be ever so kind as to share the current implementation of your custom transforms to speed up the development? I am working only 1-2 days a week on MONAI Label directly and it would help me substantially to see how a working approach is designed.
Hi @danieltudosiu, thank you so much for your quick response! Interesting project idea! It would be great to have a more comprehensive analysis of the benefits and potential pitfalls regarding ways of simulating user clicks. I opened a Discussion last week since there does not seem to be an implementation of many Interactions
, which vary substantially in literature. I was wondering why would you not adapt the Interaction
class instead of using Transforms
for the clicking protocol. It seems that the Interaction
class is particularly suited for this.
However, the click sampling itself, and the seed->guidance mapping are both individual Transforms
, so maybe you meant that.
I have also observed that using heatmaps, especially with a large variance factor, leads to a degradation in performance. Surprisingly, sometimes even adding more clicks decreases the performance of DeepEdit. I also suspect that this is caused by "information leakage" because when I reduced the sigma=0 (literally using only the center voxel) DeepEdit started to improve with each click.
I think that the ablation study in the CVPR 2019 paper above misses some comparisons to Geodesic Distance, Exponentialized Geodesic Distance, Chebyshev Distance and Superpixel Guidance. Additionally, such ablation studies do not yet exist for medical data.
I will make my code publicly available as soon as my approach is stable enough. Thanks for your interest!
@Zrrr1997 I was not aware of Chebyshev Distance. Regarding the Superpixel Guidance, I think in medical imaging it is not transposable one-to-one due to its need for pre-trained networks, but if are to consider some of the works of Super Pixel + Random Walkers then we could conceivably use them for medical imaging.
As I started working on interactive segmentation only about 2 months I would appreciate it if we could have an online chat to try and integrate your expertise into the design of the InteractionProtocol
.
@michela the work that @Zrrr1997 is doing might be of interest to you.
@Zrrr1997 also, there is another dimension that we need to be aware of here is that in CV they are fine with an 85% IoU lets say but in Medical Imaging we need 95%+ IoU to be even viable as a tool. Since we are talking about speeding up the golden standard generation and not just getting a good enough segmentation.
As an update.
Upon further research, the handler strategy would add unwarranted complexity due to its requirement to decolate and colate the data inside or create batch-aware transformations.
I will go forward with a single transform the InteractionProtocolDict
and its arguments will be the proto_segmentation
, interest_definition
, seeding
and mapping
. Those will be stored in a different folder inside the MONAILabel.
The next step is to port the available transformations into the new framework starting with DeepEdit.
I will try to make a list of things that need porting and then we can discuss which we will support officially and which we leave within the sample apps.
I reckon that sample apps don't need porting per se since they are just research examples. I would advise creating a template app (for example we could use the DeepEdit Radiology) that is very well documented in a tutorial fashion.
@SachidanandAlle @tangy5 @diazandr3s what is your take?
Also, happy holidays <3
Is your feature request related to a problem? Please describe. At the moment there is no straightforward way to decouple the Image Signal and Guidance Signal from the DeepEdit (and I assume the rest of the library).
The problem stems from the usage of the base SupervisedTrainer and bypassing its limitations via a new iteration method that internally calls the default iteration function.
Describe the solution you'd like I think DeepEdit should incorporate the Interaction logic into a new Trainer that allows the same flexibility with added ones such as specialised prepare_batch for both the internal loop as the external loop.
Furthermore, there is no default Key for the Signal similar to GanKeys or AdversarialKeys.
Lastly, currently, DeepEdit development is hindered by using the default prepare_batch that does not take into account the existence of Guidance Signal and its usage in specialised blocks such as:
Additional context Currently trying to improve the click ingestion of MONAI Label together with Self-Supervised Learning.
Edit: Added more examples of recent techniques requiring the decoupling of signals. Edit 2: Added details about the lack of keys.