JJGO / UniverSeg

UniverSeg: Universal Medical Image Segmentation
Apache License 2.0
481 stars 49 forks source link

Integrating MegaMedical to train UniverSeg #31

Closed Tianananana closed 1 month ago

Tianananana commented 1 month ago

I have a question regarding how do you integrate a huge dataset like MegaMedical to train UniverSeg.

From your supplementary, you mentioned using LMDB datastores (also related to issue #29) for fast I/O processing via Thunderpack. I am interested in the details as to how Thunderpack is being implemented as part of the training pipeline. Ie:

Thanks for your great work! It would be really helpful if you could share any additional technical details/code regarding the training pipeline., looking forward to hearing from you!

VictorButoi commented 1 month ago

Greetings!

Indeed we did you LMDB stores (publicly available with https://github.com/JJGO/thunderpack) that allowed us to have super fast I/O despite having tons of files. In response to your questions,

1) Instead of a single LMDB for the entire dataset, we implemented our MegaMedical class object as a collection of LMDBs, one corresponding to each task.

2) Actually, while these augmentations are done online, we didn't actually do this in our __getitem__ call but rather in the forward loop. What this allows you to do is to perform the augmentations on the GPU rather than CPU (which makes them significantly faster). We used Kornia https://kornia.readthedocs.io/en/v0.4.1/tutorials/data_augmentation.html for this.

Hope this helps!

Tianananana commented 1 month ago

Thank you for your reply! So during forward, you do you use a conventional dataloader to iteratively load all collection of LMDBs (tasks), or that for each forward loop, only a single LMDB (task) is sampled from the MegaMedical class object? After selecting the task, the query and support data is sampled in the __getitem__ of the LMDBs? And finally, the augmentations are then performed?

Thank you !!

VictorButoi commented 1 month ago

We do the latter, that at each iteration we sample a lmdb task and then we use the getitem of the LMDBs? (And yes afterwards the augmentations are performed afterwards when the input images/segs are on the GPU)

Tianananana commented 1 month ago

can I also confirm then that during the data processing with MegaMedical, do you perform the augmentation before or after converting the images into greyscale/single-channel? Since some of the augmentations include changing the brightness and contrast. If it was done after, it would have different transformations.

Also, when you perform benchmarking with the other models (PANet, ALPNet, SENet and nnUNet), did you also used the greyscaled versions of size (128, 128) across all models for training?

(im assuming here that a conversion to greyscale is done somewhere in the pipeline, since Universeg takes in a single-channel image, and uses the channel dimension for cross convolution)

Apologies for the many questions, your insights so far has been really helpful Thank you !!

VictorButoi commented 1 month ago

can I also confirm then that during the data processing with MegaMedical, do you perform the augmentation before or after converting the images into greyscale/single-channel? Since some of the augmentations include changing the brightness and contrast. If it was done after, it would have different transformations.

We converted the images into greyscale/single-channel before training to save doing that during training

Also, when you perform benchmarking with the other models (PANet, ALPNet, SENet and nnUNet), did you also used the greyscaled versions of size (128, 128) across all models for training?

Yep! We made sure that all models trained on the same data.

Let me know if you have any more questions!

Tianananana commented 1 month ago

Ok, got it, thank you!