Closed NorbertZheng closed 1 year ago
UNETR, ViT as Encoder, CNN as Decoder
UNETR consists of a transformer encoder that directly utilizes 3D patches and is connected to a CNN-based decoder via skip connection.
UNETR: Transformers for 3D Medical Image Segmentation, UNETR, by NVIDIA, and Vanderbilt University, 2022 WACV, Over 340 Citations. Medical Imaging, Medical Image Analysis, Image Segmentation, U-Net, Transformer, Vision Transformer, ViT
Biomedical Image Segmentation 2015 … 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] ==== My Other Paper Readings Also Over Here ====
At each resolution, the reshaped tensors, e.g.
are projected from the embedding space into the input space by utilizing consecutive $3\times 3\times 3$ convolutional layers that are followed by batch normalization layers.
The loss function is a combination of soft dice loss and cross-entropy loss:
Quantitative comparisons of segmentation performance in BTCV test set. Top and bottom sections represent the benchmarks of Standard and Free Competitions respectively.
Qualitative comparison of different baselines in BTCV cross-validation.
Quantitative comparisons of the segmentation performance in brain tumor and spleen segmentation tasks of the MSD dataset.
Effect of the decoder architecture on segmentation performance. NUP, PUP and MLA denote Naive UpSampling, Progressive UpSampling and Multi-scale Aggregation.
Effect of patch resolution on segmentation performance.
Comparison of number of parameters, FLOPs and averaged inference time for various models in BTCV experiments.
Sik-Ho Tsang. Review — UNETR: Transformers for 3D Medical Image Segmentation.