Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.71k stars 1.04k forks source link

ViT ValueError #3836

Closed ouyuxuanbridge closed 2 years ago

ouyuxuanbridge commented 2 years ago

Describe the bug ValueError,optimizer got an empty parameter list To Reproduce Steps to reproduce the behavior: implementing ViT by your instruction of implementing UNETR, change the model to ViT

Screenshots image

ps:I am using multi-gpu so I'm wondering whether there is something to do with it. Thank you so much!

Nic-Ma commented 2 years ago

Hi @Ouyuxuan623 ,

Thanks for your interest here. Could you please help share your full test program to reproduce it? @ahatamiz I think you may be able to help check it later?

Thanks in advance.

ahatamiz commented 2 years ago

Hi @Ouyuxuan623

Thanks for your interest. Would you please share more details ( and possibly code snippets) ?

Thanks

ouyuxuanbridge commented 2 years ago

I really appreciate your work Monai, so amazing!

I am a huge fan of your work.

Thank you very much!

I follow the instruction here, https://github.com/Project-MONAI/tutorials/blob/master/3d_segmentation/unetr_btcv_segmentation_3d.ipynb

The only change I have is the model code because I would like to implement ViT.

The code is here: model = ViT( in_channels=1, img_size=(96, 96, 96), patch_size=(32,32,32), hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed="perceptron", classification=False, dropout_rate=0.0, spatial_dims=3 )

model = nn.DataParallel(model.cuda,device_ids=[0,1,2,3,4,5]) model.to(device)

The error is this: 151645499848_ pic

I have no idea how to deal with it.

Could you help me with this?

No words can express how thankful I am.

ouyuxuanbridge commented 2 years ago

Ohh, sorry, I figure out where I was wrong.

It should be like this: model = nn.DataParallel(model.cuda,device_ids=[0,1,2,3,4,5])

ahatamiz commented 2 years ago

Hi @Ouyuxuan623

Thanks for the comment. Sure. I am glad the issue is resolved. Also, I tried to reproduce this issue without nn.DataParallel, and everything seems to work as expected:

import torch
from monai.networks.nets.vit import ViT

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ViT(
in_channels=1,
img_size=(96, 96, 96),
patch_size=(32,32,32),
hidden_size=768,
mlp_dim=3072,
num_layers=12,
num_heads=12,
pos_embed="perceptron",
classification=False,
dropout_rate=0.0,
spatial_dims=3
).to(device)

torch.backends.cudnn.benchmark = True
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-5)

In MONAI Research Contribution repository (link here) we provide support for multi-gpu training which can also be helpful. I recommend checking it out.

Thanks

ouyuxuanbridge commented 2 years ago

Hi @ahatamiz , thanks so much!

I will definitely check the link about multi-gpu.

There is another bug occurring and I have no idea. Could you help me? 191645509626_ pic

ahatamiz commented 2 years ago

Sure. The error you mentioned is due to the fact that the output of ViT model is a tuple, consisting of the logits and hidden states as shown here.

I would recommend passing the first element as logit_map to your loss function.

Thanks

ouyuxuanbridge commented 2 years ago

Also, I would like to ask about how to implement multi-gpu, it says "To initiate distributed multi-gpu training, --distributed needs to be added to the training command."

Could you give me a example of the code?

What's more, how can I implement it since I am using Jupyter,not a command line.

Thanks again!

ahatamiz commented 2 years ago

Sure. We only need to pass --distributed in the command line to indicate multi-gpu training. For example:

python main.py
--feature_size=32 
--batch_size=1
--logdir=unetr_test
--fold=0
--optim_lr=1e-4
--lrschedule=warmup_cosine
--infer_overlap=0.5 
--save_checkpoint
--data_dir=/dataset/dataset0/
--distributed

Thanks

ouyuxuanbridge commented 2 years ago

Thank you a lot for your help and patience!

I am so thankful!

Still I think I already pass the first element as "logicmap", see this: ![201645510413 pic](https://user-images.githubusercontent.com/56747596/155073515-56b7f584-9705-4541-8e34-040bf3d53b72.jpg)

ahatamiz commented 2 years ago

In the above you should pass the following:

loss = loss_function(logit_map[0] , y)

Essentially, logit_map[0] is the logits predicted by ViT as shown above.

ouyuxuanbridge commented 2 years ago

Thanks a lot! But it seems there is another problem. 211645510666_ pic

ahatamiz commented 2 years ago

I see. This error is caused by the incompatibility of ground truth labels and predictions ( most likely one is in one-hot-encoded while the other is not).

Thanks

ouyuxuanbridge commented 2 years ago

Yes, do you have any idea how to deal with this problem?

I have to process my dataset?

Why there is no such problem when I am using UNETR?

Thank you very much!

ahatamiz commented 2 years ago

For UNETR, we have already applied the correct transform to labels. I recommend using one_hot (link here) to convert the labels to the right format.

Thanks

ouyuxuanbridge commented 2 years ago

Sure, I'll check it out.

Thanks!! A Lot!!

ouyuxuanbridge commented 2 years ago

The code for ViT is just the encoder? Not the whole end-to-end model.

But UNETR is complete model.

Am I right?

ouyuxuanbridge commented 2 years ago

@Nic-Ma @ahatamiz Hi , in the experiment section in your paper, there are SETRs. I am wondering could you offer the code for these models, because in the official release, they are only for two dimensions.

lilipj commented 2 years ago

Hello, I don't see how I can solve this error (Vit output tupple), when using a supervised evaluator. evaluator_loss_val = create_supervised_evaluator( net, loss_to_log, device, non_blocking=True, output_transform=lambda x, y, y_pred: (y_pred, y), prepare_batch=prepare_batch ) I tried to replace (y_pred, y) with (y_pred[0], y) but I still have the same error, in _compute_pred_loss() function in trainer.py

Thanks