Project-MONAI / research-contributions

Implementations of recent research prototypes/demonstrations using MONAI.
https://monai.io/
Apache License 2.0
995 stars 328 forks source link

Applying Swin UNETR to a new segmentation task #66

Closed charlesmoatti closed 2 years ago

charlesmoatti commented 2 years ago

Hi,

Thank you for the great work and open source code!

I am trying to use the Swin UNETR for my own segmentation task consisting of T2 brain images for segmentation into 9 labels. I would like to train a Swin UNETR on my own data and then infer on my test data. I am working with the BTCV repository as it seemed to have the structure that is required for running such a training and inference.

I put my training/validation/test data in the same folder structure as is required by the decathlon dataset with a dataset.json file describing the data (imagesTr, imagesTs, labelsTr, labelsTs consisting of .nii.gz files). I then launch python main.py --data_dir DATA_DIR --json_list dataset.json --roi_x 32 --roi_y 32 --roi_z 32 --batch_size 2 and get the following error which I do not quite understand how to fix.

Screenshot from 2022-06-21 15-34-17

ValueError: Expected more than 1 spatial element when training, got input size torch.Size([8, 768, 1, 1, 1])

My data ran perfectly with nnUNet (https://github.com/MIC-DKFZ/nnUNet) for comparison, and it is to be noted that some of my nifti images have different sizes.

davidkvcs commented 2 years ago

Hi I am not expert and just a random user of this -- but I was attempting something similar, so maybe this helps.

The script is adapted specifically to the BTCV dataset so it is different from nnUNet.

I can recommend you check out the BTCV tutorial.

Here it is clear, that the train_trainforms and val_transforms are adapted to the BTCV dataset.

That means you have to adapt these for your dataset specifically.

For example, I have 3d dataset with 2 channels. I changed my nifti files to have the following dimentions, (output of fslinfo):

dim1 2 dim2 512 dim3 512 dim4 175

I now remove the line AddChanneld(keys=["image", "label"]) from train_transforms = Compose( ...' Since I already have dimentions as my first channel. After these changes, I got the expected dimensions while running the scripts on my data.

For my case I also did normalization beforehand and wrote my own script to check orientations are all the correnct, so I also removed those lines from the script. Of course, what you should do depends on your data. :)

If you are in doubt what dimensions to expect, I can recommend that you download the BTCV dataset online and make a print of the tensor shapes fx under "Check data shape and visualize" in the turorial. You should then be able to deduce what the corresponding dimensions should be for your specific case.

Note that if you use the test.py script later on, this uses transforms from get_loader in data_utils. So changes you make in your transforms during training should also be matched in the get_loader of data_utils, so you continue to load the data as expected.

tangy5 commented 2 years ago

Hi @davidkvcs , thanks for the question and interest of the work. The problem should be related to the patch size. The original configuration used 96x96x96 as the sub-volume for all experiments. The SwinUNETR model contains several downsampling operation. Not sure this is the best solution, but using 64 for each dimension is the minimum requirement for Training SwinUNETR. Hope this can help you re-design your data transformations.