M3DV / LeFusion

LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
https://arxiv.org/abs/2403.14066
Apache License 2.0
9 stars 2 forks source link

Question on choice of conditioning #4

Open pedr0sorio opened 19 hours ago

pedr0sorio commented 19 hours ago

Hi guys,

Once again, congrats on the work! This is a quite cool idea and something that I had been thinking of implementing myself before realising you were already on it.

In my planning, I envisioned using the [additional lesion metadata coming with the LIDC-IDRI dataset](https://pylidc.github.io/tuts/annotation.html#:~:text=ann.print_formatted_feature_table()) as additional conditioning information for the lesion generation process. I understand here you only make use of the histogram info which you extract from the dataset and the specific lesion mask.

I wonder if you considered using that lesion data, and if it was a design choice, what was your rationing behind not using it.

Thanks again!

Best, Pedro

duducheng commented 16 hours ago

Hi Pedro,

Thanks for your interest! We actually did consider the other attributes from the LIDC dataset. Incorporating these attributes directly alongside the histogram as conditioning information would indeed be straightforward, but we opted not to use them for a few reasons:

  1. Our current focus with LeFusion is on general lesion segmentation tasks, and we wanted our method to be applicable across a wide range of lesion segmentation datasets. The histogram is something that can be derived from any dataset, which makes it a more generalizable choice compared to manually annotated labels like attenuation (texture) and other clinical attributes.

  2. Clinical attribute conditioning is more suited for projects with a clinical focus. We are planning a large-scale extension of this method specifically for nodules, and designing a NoduleBank for generating a broad range of lung lesions. However, evaluating the generation based on clinical attributes poses its own challenges, which is why we’ve chosen to separate this from the current work that focuses on improving segmentation.

  3. While LIDC is one of the largest publicly available lung lesion databases, it’s still relatively small with thousands of samples. This poses challenges for both training and evaluating generative models. Fortunately, we have internal datasets that are tens of times larger, which we’ll be using to scale up our efforts. Stay tuned!

Best,
JC