End-to-end modular machine learning framework for classification, segmentation and unsupervised learning. Yucca is designed to be plug-and-play while still allowing for effortless customization. This allows users to employ the basic Yucca models as solid baselines, but it also allows users to change and experiment with exact features in a robust and thoroughly tested research environment. The Yucca project is inspired by Fabien Isensee's nnUNet.
Create a python=3.10 or python=3.11 environment exclusively for Yucca to avoid conflicts with other projects.
IMPORTANT: First install Pytorch for GPU following appropriate instructions from e.g. https://pytorch.org/get-started/locally/. Then navigate to Yucca and install the package from there.
For an Ubuntu system with Cuda=>12.1 and python=3.11:
> git clone https://github.com/Sllambias/yucca.git
> conda create -n yuccaenv python=3.11
> conda activate yuccaenv
> conda install -c anaconda setuptools
> conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit
> conda install pytorch==2.1.2 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
> cd yucca
> pip install -e .
To use other CUDA or PyTorch versions refer to 1. for the current PyTorch installation, 2. for previous versions and 3. for the appropriate CUDA toolkit. Note that the CUDA versions used in the PyTorch and CUDA-toolkit installations should match (in the example above both use 12.1).
If you just want to install Yucca locally on your computer, use
pip install git+https://github.com/Sllambias/yucca.git
this will install the code from github, not an eventual local clone.
Weights & Biases is the main tool for experiment tracking in Yucca. It is extremely useful to understand how your models are behaving and often also why. Although it can be disabled, it is heavily encouraged to install and use it with Yucca.
Navigate to https://wandb.ai/home and log in or sign up for Weights and Biases. Activate the appropriate environment, install Weights and Biases and log in by following the instructions (i.e. paste the key from https://wandb.ai/authorize into the terminal).
> conda activate yuccaenv
> pip install wandb
> wandb login
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
The Yucca pipeline comprises the 4 processes illustrated in the diagram. In the first step, the user is expected to prepare the data for Yucca. In the remaining three steps, Yucca will take over regarding file management.
Initially, the environment variables used in Yucca must be defined. To set these, see the Environment Variables guide.
Before preprocessing and training, all datasets must be converted to Yucca-compliant tasks. This is done to ensure reproducibility and eliminate data leakage. For a tutorial see the Task Conversion Guide.
Preprocessing is carried out using the yucca_preprocess
command. For advanced usage see: run_scripts_advanced.py
Basic Yucca preprocessing relies on three CLI flags:
YuccaPlanner
, but it can also be any custom planner found or created in the Planner directory and its subdirectories.YuccaPreprocessor
(default), ClassificationPreprocessor
and UnsupervisedPreprocessor
. The only aspect in which they differ is how they expect the ground truth to look. The YuccaPreprocessor
expects to find images, the ClassificationPreprocessor
expects to find .txt files with image-level classes and the UnsupervisedPreprocessor
expects not to find any ground truth. An example of preprocessing a task called Task001_Brains
with the default planner and the ClassificationPreprocessor
:
> yucca_preprocess -t Task001_Brains -pr ClassificationPreprocessor
Training is carried out using the yucca_train
command. For advanced usage see: run_scripts_advanced.py
. Before training any models, a preprocessed dataset must be prepared using the yucca_preprocessing
command.
Basic Yucca training relies on five CLI flags:
U-Net
, UNetR
, MultiResUNet
and ResNet50
.YuccaManager
.YuccaPlanner
.An example of training a MultiResUNet
with the default Manager on a task called Task001_Brains
that has been preprocessed using the default YuccaPlanner
:
using a 2D MultiResUnet
:
> yucca_train -t Task001_Brains -m MultiResUNet -d 2D
Inference is carried out using the yucca_inference
command. For advanced usage see: run_scripts_advanced.py
. Prior to inference, the model must be trained using the yucca_train
command, and the target dataset must be task-converted.
Basic Yucca inference relies on six CLI flags.
YuccaManager
.An example of running inference on the test set of a task called Task001_Brains
, using a 3D MultiResUnet
trained on the train set of the same task:
> yucca_inference -t Task001_Brains -s Task001_Brains -m MultiResUNet
An example of running inference on the test set of a task called Task002_Lungs
, using a 2D UNet
trained on a task called Task001_Brains
:
> yucca_inference -t Task002_NotBrains -s Task001_Brains -d 2D -m UNet
To train an ensemble of models we use the yucca_preprocess
, yucca_train
and yucca_inference
commands. For advanced usage see: run_scripts_advanced.py
. A common application of model ensembles is to train 2D models on each of the three axes of 3D data (either denoted as the X-, Y- and Z-axis or, in medical imaging, the axial, sagittal and coronal views) and then fuse their predictions in inference.
To train 3 models on the three axes of a 3D dataset called Task001_Brains
prepare three preprocessed versions of the dataset using the three Planners YuccaPlannerX
, YuccaPlannerY
and YuccaPlannerZ
:
> yucca_preprocess -t Task001_Brains -pl YuccaPlannerX
> yucca_preprocess -t Task001_Brains -pl YuccaPlannerY
> yucca_preprocess -t Task001_Brains -pl YuccaPlannerZ
Then, train three 2D models one on each version of the preprocessed dataset:
> yucca_train -t Task001_Brains -pl YuccaPlannerX -d 2D
> yucca_train -t Task001_Brains -pl YuccaPlannerY -d 2D
> yucca_train -t Task001_Brains -pl YuccaPlannerZ -d 2D
Then, run inference on the target dataset with each trained model.
> yucca_inference -t Task001_Brains -pl YuccaPlannerX -d 2D
> yucca_inference -t Task001_Brains -pl YuccaPlannerY -d 2D
> yucca_inference -t Task001_Brains -pl YuccaPlannerZ -d 2D
Finally, fuse their results and evaluate the predictions.
> yucca_ensemble --in_dirs /path/to/predictionsX /path/to/predictionsY /path/to/predictionsZ --out_dir /path/to/ensemble_predictionsXYZ
Training classification models is carried out by:
.txt
files. See the Task Conversion guide for instructions on how to convert your datasets.ClassificationPreprocessor
, such as the ClassificationPlanner
. This preprocessor expects to find .txt
files rather than image files in the label folders and it does not perform any preprocessing on the labels. Alternatively, the ClassificationPreprocessor
can be selected using the -pr ClassificationPreprocessor
flag in yucca_preprocess
YuccaPlanner_224x224
. Having a fixed image size enables training models on full images, rather than patches of images. This is often necessary in classification where we want 1 (or very few) image-level prediction.patch_based_training=False
, such as the YuccaManager_NoPatches
.ResNet50
but most networks can be adapted to support this with limited changes (in essence, this can be achieved by adding a Linear layer with input channels equal to the flattened output of the penultimate layer and output channels equals to the number of classes in the dataset).yucca_inference
with the --task_type classification
flag. Training segmentation models is carried out by following the standard procedure introduced in the Introduction to Yucca
Training Unsupervised models is carried out by:
UnsupervisedPreprocessor
, such as the UnsupervisedPlanner
. This preprocessor expects to find no label files. Alternatively, the UnsupervisedPreprocessor
can be selected using the -pr UnsupervisedPreprocessor
flag in yucca_preprocess
.When models are trained on a dataset preprocessed with the UnsupervisedPreprocessor, Yucca will use the unsupervised
preset in the YuccaAugmentationComposer
. This sets (1) skip_label
to True (which means we don't expect a label in the array), (2) copy_image_to_label
to True, which means the image data is copied to also be the label data (the image is copied after applying normal augmentations) and finally, (3) mask_image_for_reconstruction
to True, which means we randomly mask the image data (this is applied AFTER the image is copied to the label).