Details about segmentation

Itsanewday commented 7 months ago

Hi, Thanks for your great work! In the paper, you showed that AI model trained on synthesizing tumor only can detect the tumors well or even better than the fully-supervised models. why not train the model together with the real tumors? As we know that Deep Learning is Data Driven, the more trainning data you have, the better performance you may get. Indeed, you shared the code for training with both synthesized and real tumors, can you also share the comparison results?

qixinhu11 commented 3 months ago

Sorry for the late reply. As you mentioned training the AI model with real and synthetic tumors. This is exactly what many previous papers did (you could check some references in related works), but it might look like this synthetic tumor is just a data augmentation method. In fact, we did some similar experiments in the appendix, with the training sets stay similar numbers. The real question is: if we have unlimited healthy CT scans, generating as many synthetic tumors as possible, together with real liver tumors from LiTS, can it benefit model training? I think it is an open question, demanding lots of further investigation. From a larger perspective, the question is: can synthetic data (whether it is the modeling-based human-designed rules or generative models like stable-diffusion or Sora) always benefit model training? It is not easy to answer, personally, I think it will cause other issues.

Itsanewday commented 3 months ago

Thanks for your detailed reply! There will be a long way for the real AI.

---Original--- From: @.> Date: Tue, Mar 12, 2024 17:24 PM To: @.>; Cc: @.**@.>; Subject: Re: [MrGiovanni/SyntheticTumors] Details about segmentation (Issue#7)

Sorry for the late reply. As you mentioned training the AI model with real and synthetic tumors. This is exactly what many previous papers did (you could check some references in related works), but it might look like this synthetic tumor is just a data augmentation method. In fact, we did some similar experiments in the appendix, with the training sets stay similar numbers. The real question is: if we have unlimited healthy CT scans, generating as many synthetic tumors as possible, together with real liver tumors from LiTS, can it benefit model training? I think it is an open question, demanding lots of further investigation. From a larger perspective, the question is: can synthetic data (whether it is the modeling-based human-designed rules or generative models like stable-diffusion or Sora) always benefit model training? It is not easy to answer, personally, I think it will cause other issues.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

MrGiovanni / SyntheticTumors

Details about segmentation #7