Project-MONAI / VISTA

MONAI Versatile Imaging Segmentation and Annotation
Apache License 2.0
111 stars 14 forks source link

an open question, how this can be a universal model ? #39

Open argman opened 1 week ago

argman commented 1 week ago

currently the model is trained only on CT images, but now the need of mri models are growing, how the model can be trained to do universal segmentation on any modalities ?

heyufan1995 commented 6 days ago

Hi: Thanks for the question. We are working on the next release of vista3d which are trained on large cohort of MRI datasets + CT. When you say universal segmentation, it can mean: 1. automatic segmentation with predefined tasks (liver, spleen e.t.c), include all possible tasks from all possible modalities. 2. zero-shot automatic segmentation with text prompts/open vocabulary/annotation example prompts (None of existing work works, far from expert models) 3. zero-shot interactive segmentation. SAM like work.

For 1 and 3, you can easily train a universal model with the VISTA codebsae if you carefully curate an all-modality dataset /label_dict/label_mapping and train the model. The real difficulty for a universal model in (1) is its performance compared with speciallized expert models. Unlike scaling laws in LLMs, including more irrelevant segmentation tasks may harm the performances. For example, why a model jointly trained with brain MRI dataset + lung tumor CT dataset can perform better on lung tumor than the model trained on lung tumor CT along? If the universal model is 10% lower performance than specialized model, then clinically it's not applicable. Meanwhile, predefine the tasks are also hard, e.g. should you treat mouse CT lung and human CT human lung as the same task "lung segmentation"? For (3), after training on all-modality dataset, the VISTA model will work (interactive seg) on modalities in your curated dataset, but the zero-shot performance is not guaranteed. VISTA distilled information from SAM using supervoxel. Similar methods are needed. The reason is the diversity of curated medical data (annotation diversity) is not enough for the model to learen zero-shot. You may need SAM's data for training or start from their checkpoint (directly use or distill).

For 2. It's ongoing research and long way to go. Performance in 3D far from nnunet/auto3dseg.

You can read the VISTA3D paper for more discussions.

argman commented 4 days ago

even for mri, there are different modalities and it's impossible to have annotations for all modalities, how do you solve this problem ? do you have test for out-of-domain data ?

heyufan1995 commented 3 days ago

Using synthetic data is the only way to go. The ongoing effort of MAISI https://developer.nvidia.com/blog/addressing-medical-imaging-limitations-with-synthetic-data-generation/ is one solution. Beside this, large intensity augmentation is also worth a try. https://github.com/BBillot/SynthSeg