One problem came when I was using SwinUNETR to train my own data

wangbaoyuanGUET commented 10 months ago

I'm a graduate student from China interested in deep learning. Recently I'm using MONAI framework on my project, I think it't very high-efficient and convenient. But I faced a problem bothering me for a long time when I use SwinUNTER to train my own data.Next I will explain how I did it. Firstly, I know that SwinUNETR can be trained directly on BTCV which has 13 classes, but my data contains 2 classes, background and liver tumor, so I change the out_channels to 2. And the size of my data is 512x512, the size of the z-axis is probably between 40 and 60, so I change the roi_size to (64, 64, 64 ). I don't know what will happened if the z-axis of roi_size is larger than my data, but it doesn't raise any mistake so I keep it. The above modifications allowed me to train with SwinUNETR and my own successfully. To compute the loss of the output and ground truth, I modify some code in the test.py which is shown in the figure below. And I turned on save_checkpoint to save the .pt files so that I can test SwinUNETR and my own net on the test set. But something strange happened on the image after the prediction. A rectangular prediction box appears around the predicted image, but this part is blank in the original image and does not have any labels.I think maybe it's caused by the size of Roi_size and feature_size...Because the z-axis of my own data doesn't equal to the BTCV data set. Such a situation does't appear on my own net. The figure below shows my own network prediction, I still use the SwinUNETR training framework, but I replaced the network with my own, so I guess some of the parameters of Swin UNETR are wrong. I don't know what's wrong. I changed a lot of Swin UNETR's hyperparameters, but the problem wasn't solved.

wangbaoyuanGUET commented 10 months ago

@tangy5 Hi!

tangy5 commented 10 months ago

Hi @wangbaoyuanGUET ,great to see you are training your own data. looks like the model checkpoint is not good enough, not related to Swin-UNETR parameter , or maybe there are errors in the pre-processing steps. you could check:

whether the image is padded to 64x64x64. (Add spatial pad transform)
monitor the validation metrics to see if the model training is converged. transformer models are typically harder to converge.
Whether loss functions are properly set with 2 classes.
Input/output channels is correct.
When doing inference, whether the model checkpoint is loaded.

Following your successful prediction, you could check if everything else is the same except the backbone.

Hope above helps and let me know if you have more details or findings.

wangbaoyuanGUET commented 10 months ago

@tangy5 Thanks for your reply very very much！ I contact deep learning from few months ago, although I don't have a good foundation, I kept working on my coding skills. So I was pleasantly surprised to hear from you so quickly. I will check my code according to your valuable opinions. Thank you again!

wangbaoyuanGUET commented 10 months ago

Hi! Mr.Tangy @tangy5 I'm glad to tell you about that I've solved the problem I mentioned before. I guessd that the problem is caused by the roi_size of (64, 64, 64). When the preprocessing method called RandCropByPosNegLabeld crops the data into 4 patches, the roi_size (64, 64, 64) may not cover the entire image. When testing, so does it. So I change the roi_size to (96, 96, 32) and the problem stopped coming up. Right now I'm running a new personal dataset to test the validity of my own models. I appreciate for your help which is important to me. By the way, the Monai framework is very very great and I like it !

wangbaoyuanGUET commented 6 months ago

Hi! Mr.Tangy. I am the Chinese postgraduate student who asked you a question before and you gave me a great answer.So then I finished the last work. Now I have another question on my next work. Specific: I want to train a backbone encoder(ViT encoder) to make segmentation on Vrterial phase image and I need the encoder to learn extra knowledge from Arterial phase image. So I let two different phase images going through two identical Encoders and make them into Embeddings. Before the Vrterial phase embedding upsample on the decoder, I hope to calculate the similarity on the two embeddings on different phase to restrict the Vrterial phase encoder. Do you think it's feasible? Or is there any way to calculate their similarity? Thank you very much. Mr.Tang.🥰

tangy5 @.***> 于2024年1月6日周六 14:25写道：

Hi @wangbaoyuanGUET https://github.com/wangbaoyuanGUET ,great to see you are training your own data. looks like the model checkpoint is not good enough, not related to Swin-UNETR parameter , or maybe there are errors in the pre-processing steps. you could check:

whether the image is padded to 64x64x64. (Add spatial pad transform)

monitor the validation metrics to see if the model training is converged. transformer models are typically harder to converge.

Whether loss functions are properly set with 2 classes.

Input/output channels is correct.

When doing inference, whether the model checkpoint is loaded.

Following your successful prediction, you could check if everything else is the same except the backbone.

Hope above helps and let me know if you have more details or findings.

— Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/research-contributions/issues/350#issuecomment-1879571184, or unsubscribe https://github.com/notifications/unsubscribe-auth/BC3OAMTQDBZW42JPTG7PSP3YNDU6HAVCNFSM6AAAAABBPH3IROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZZGU3TCMJYGQ . You are receiving this because you were mentioned.Message ID: @.***>

Project-MONAI / research-contributions

One problem came when I was using SwinUNETR to train my own data #350