Medical diffusion on a budget

:earth_africa: Check out our project page for links to paper/poster & more!

Repository for the paper 'Medical Diffusion on a budget: Textual Inversion for medical image generation', presented at MIDL 2024.

Textual Inversion

Textual Inversion and image generation was performed with the AUTOMATIC1111 web UI. Specifically, the version of the repository at commit d050bb7 was used.

To start generating with the embeddings, follow the installation instructions there and use the Stable Diffusion 2.0 checkpoint, specifically 512-base-ema.ckpt.

Trained embeddings

All trained embeddings are included in the embeddings folder.

Data pre-processing

To use this with your own images, you have to prepare RGB images of size 512x512. You can have a look at convert_to_2d.py for inspiration. This is (approximately) the script that was used for the PICAI dataset.

Classifier training

The environment used to train the binary classifiers can be recreated from requirements.txt.

Models can be trained with the train.py script, which is based on PyTorch Lightning Flash and Hydra.

Default configuration can be set in conf/config.yaml.

FID scores

FID scores were calculated with evaluate_generation_quality.py, e.g.

python evaluate_generation_quality.py --generated_path /path/to/generated/images --reference_path /path/to/reference/images

StyleGAN3 baseline

For the StyleGAN3 baseline used in the paper, we refer to the original repository for details on installation and requirements.

The following command was used to train:

train.py --cfg=stylegan3-t --data=/path/to/train/set --gpus=1 --batch=4 --gamma=8 --mirror=1 --kimg=100 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-afhqv2-512x512.pkl --snap=25 --tick=1 --mbstd-group 1 --metrics none

Poster template

Feel free to use the poster template, made with Quarto. To preview/render the poster:

cd poster
quarto preview/render poster.qmd

brambozz / medical-diffusion-on-a-budget

readme