Code on nnUNetV2 - Githubissues

PengchengShi1220 / NexToU

NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation

Apache License 2.0

57 stars 2 forks source link

Code on nnUNetV2 #3

Closed Overflowu7 closed 1 year ago

Overflowu7 commented 1 year ago

Have you tried porting your code to nnUNetV2?

PengchengShi1220 commented 1 year ago

Thank you for the suggestion. We’re actively considering migrating to nnUNetV2. Updates will be rolled out soon. Stay tuned.

Overflowu7 commented 1 year ago

Thank you for the suggestion. We’re actively considering migrating to nnUNetV2. Updates will be rolled out soon. Stay tuned.

thank you.If you don't mind, I'd like to ask the questions in Chinese. 我对DC_and_CE_and_BTI_Loss和MultipleOutputLoss2(self.loss, self.ds_loss_weights)怎么用有些疑问，是直接和nnUNet自带的cedice融合了一下吗？还有一个问题是V2版本没有self.net_num_pool_op_kernel_sizes这个参数，我想问一下它应该代表什么

PengchengShi1220 commented 1 year ago

Thank you for the suggestion. We’re actively considering migrating to nnUNetV2. Updates will be rolled out soon. Stay tuned.

thank you.If you don't mind, I'd like to ask the questions in Chinese. 我对DC_and_CE_and_BTI_Loss和MultipleOutputLoss2(self.loss, self.ds_loss_weights)怎么用有些疑问，是直接和nnUNet自带的cedice融合了一下吗？还有一个问题是V2版本没有self.net_num_pool_op_kernel_sizes这个参数，我想问一下它应该代表什么

You are correct in understanding how DC_and_CE_and_BTI_Loss functions. It leverages the benefits of ce_loss, dc_loss, and bti_loss, all of which are inherent losses from nnUNet. These losses are amalgamated in a balanced manner, with weights set as self.weight_ce=1, self.weight_dice=1, self.weight_ti=1e-6 (for 3D) and self.weight_ti=1e-4 (for 2D). The MultipleOutputLoss2(self.loss, self.ds_loss_weights) which you brought up is utilised for deep supervision. This method allows learning signals to permeate to the intermediate layers of deep networks by attributing weights to each level. The weight for each loss at these stages is controlled by ds_loss_weights. Whether DC_and_CE_and_BTI_Loss or MultipleOutputLoss2(self.loss, self.ds_loss_weights) is used depends on your specific requirements. The NexToU_nnunetv2 version has been revised and no longer requires the parameter self.net_num_pool_op_kernel_sizes. For additional details, refer to the revised script at NexToU_Encoder_Decoder.py. Don't hesitate to ask if you have more questions.

Overflowu7 commented 1 year ago

Thank you for the suggestion. We’re actively considering migrating to nnUNetV2. Updates will be rolled out soon. Stay tuned.

thank you.If you don't mind, I'd like to ask the questions in Chinese. 我对DC_and_CE_and_BTI_Loss和MultipleOutputLoss2(self.loss, self.ds_loss_weights)怎么用有些疑问，是直接和nnUNet自带的cedice融合了一下吗？还有一个问题是V2版本没有self.net_num_pool_op_kernel_sizes这个参数，我想问一下它应该代表什么

You are correct in understanding how DC_and_CE_and_BTI_Loss functions. It leverages the benefits of ce_loss, dc_loss, and bti_loss, all of which are inherent losses from nnUNet. These losses are amalgamated in a balanced manner, with weights set as self.weight_ce=1, self.weight_dice=1, self.weight_ti=1e-6 (for 3D) and self.weight_ti=1e-4 (for 2D). The MultipleOutputLoss2(self.loss, self.ds_loss_weights) which you brought up is utilised for deep supervision. This method allows learning signals to permeate to the intermediate layers of deep networks by attributing weights to each level. The weight for each loss at these stages is controlled by ds_loss_weights. Whether DC_and_CE_and_BTI_Loss or MultipleOutputLoss2(self.loss, self.ds_loss_weights) is used depends on your specific requirements. The NexToU_nnunetv2 version has been revised and no longer requires the parameter self.net_num_pool_op_kernel_sizes. For additional details, refer to the revised script at NexToU_Encoder_Decoder.py. Don't hesitate to ask if you have more questions.

thanks for your great work! I only have two more question. 1.If i use my own dataset,is inclusion_list = [],exclusion_list = [] necessary? 2.How to change nnUNetTrainer with own trainer,i try many ways like -tr XXX, but failed.

PengchengShi1220 commented 1 year ago

The appropriate choice hangs on the number of classes in your data. If you have two or more foreground classes, BTI loss may serve your purpose. The inclusion_list and exclusion_list should be dictated by the topological interaction relationship of various anatomical classes, which should be organized on a binary tree infrastructure.
A new nnUNetTrainer classname is needed, such as "nnUNetTrainer_NexToU_BTI_Synapse". You can specify this in the command: "nnUNetv2_train 111 3d_fullres_nextou 0 -tr nnUNetTrainer_NexToU_BTI_Synapse". My environment runs Python 3.10.6, Pytorch 2.1.0, CUDA 12.1, and V100.

Overflowu7 commented 1 year ago

The appropriate choice hangs on the number of classes in your data. If you have two or more foreground classes, BTI loss may serve your purpose. The inclusion_list and exclusion_list should be dictated by the topological interaction relationship of various anatomical classes, which should be organized on a binary tree infrastructure.

A new nnUNetTrainer classname is needed, such as "nnUNetTrainer_NexToU_BTI_Synapse". You can specify this in the command: "nnUNetv2_train 111 3d_fullres_nextou 0 -tr nnUNetTrainer_NexToU_BTI_Synapse". My environment runs Python 3.10.6, Pytorch 2.1.0, CUDA 12.1, and V100.

Thank you for your prompt reply! 1.I still have question bbout"The inclusion_list and exclusion_list should be dictated by the topological interaction relationship of various anatomical classes, which should be organized on a binary tree infrastructure." Should I go to a professional doctor for anatomical classes analysis and organized on a binary tree structure?

I added the new nnUNetTrainer classname in the folder nnUNetTrainer and change run_training,but some error happend,I will check out it.
I have used the default 3d_fullres/ 3d_lowres before. After reading your answer, do I still have to write a 3d_fullres_XXX for each different model?

Overflowu7 commented 1 year ago

In addition,I also found some mistakes that were unique to my environment. 1.If the I set params in model like ""features_per_stage=[min(configuration_manager.UNet_base_num_features * 2 * i,configuration_manager.unet_max_num_features) for i in range(num_stages)]"",the errors will happend.But I set features_per_stage = 24 ,It can works. I guess this error occurs on different patch_size because the dims in patch_size usually can only be divisible by either 2 or 3, not both. Traceback (most recent call last): File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 285, in run_training_entry() File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 279, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1365, in run_training self.on_train_start() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 920, in on_train_start self.initialize() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 235, in initialize self.network = self.build_network_architecture(self.plans_manager, self.dataset_json, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 399, in build_network_architecture model = network_class( File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU.py", line 48, in init self.encoder = NexToU_Encoder(input_channels, patch_size, n_stages, features_per_stage, conv_op, kernel_sizes, strides, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 142, in init Efficient_ViG_blocks(features_per_stage[s], img_shape_list[s], s - conv_layer_d_num, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 1023, in init PoolGrapher(channels, img_shape, k[i], min(idx // 4 + 1, max_dilation), conv, act, norm, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 870, in init self.graph_conv = PoolDyGraphConv(in_channels, in_channels 2, kernel_size, dilation, conv, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 494, in init super(PoolDyGraphConv, self).init(in_channels, out_channels, conv, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 434, in init self.gconv = MRConv(in_channels, out_channels, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 402, in init self.nn = BasicConv([in_channels * 2, out_channels], act=act, norm=norm, bias=bias, drop=0., conv_op=conv_op, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/torch_nn.py", line 85, in init m.append(conv_op(channels[i - 1], channels[i], 1, bias=bias, groups=self.groups_num)) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 591, in init super().init( File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 90, in init raise ValueError('in_channels must be divisible by groups') ValueError: in_channels must be divisible by groups

2."omega = np.arange(embed_dim // 2, dtype=np.float)" in pos_embed leads to mistakes like File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 910, in init relative_pos_tensor = torch.from_numpy(np.float32(get_3d_relative_pos_embed(in_channels, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 38, in get_3d_relative_pos_embed pos_embed = get_3d_sincos_pos_embed(embed_dim, grid_size) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 78, in get_3d_sincos_pos_embed pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 99, in get_3d_sincos_pos_embed_from_grid emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 3, grid[0]) # (HWD, Dim/3) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 112, in get_1d_sincos_pos_embed_from_grid omega = np.arange(embed_dim // 2, dtype=np.float) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' So i change to float16 and it works good

PengchengShi1220 commented 1 year ago

You aren't necessarily required to consult a professional doctor to understand the "topological interaction relationship" of various anatomical classes. You can use 3D visualization software like ITK-Snap to understand the spatial relationships of various anatomical areas on your own. This will allow you to grasp their topological interaction relationship. When it comes to organizing these classes onto a binary tree infrastructure, this refers to the design of these topological relationships in a binary tree format. This essentially creates a "map" of these relationships, logically dictating which anatomical classes "include" or "exclude" others, hence the terms "inclusion_list" and "exclusion_list". In the context of this problem statement, the paper NexToU could serve as a valuable resource to understand this further. Particularly, the supplementary material's figures 4 and 5 illustrate how these lists are built in a binary tree format.
Note that changes to the run_training.py file are generally not needed. It appears that you've modified the nnUNetTrainer classname but are experiencing some errors. To successfully update the classname, please follow the example given in this nnUNetTrainer implementation.
If you're planning to use the NexToU architecture, then the default 3d_fullres or 3d_lowres settings will not be sufficient due to the requirement of channel divisibility by 3, as stated in the NexToU codebase (pos_embed.py). For NexToU, the paper suggests "UNet_base_num_features": 24, "unet_max_num_features": 312.

PengchengShi1220 commented 1 year ago

In addition,I also found some mistakes that were unique to my environment. 1.If the I set params in model like ""features_per_stage=[min(configuration_manager.UNet_base_num_features * 2 i,configuration_manager.unet_max_num_features) for i in range(num_stages)]"",the errors will happend.But I set features_per_stage = 24 ,It can works. I guess this error occurs on different patch_size because the dims in patch_size usually can only be divisible by either 2 or 3, not both. Traceback (most recent call last): File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 285, in run_training_entry() File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 279, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1365, in run_training self.on_train_start() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 920, in on_train_start self.initialize() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 235, in initialize self.network = self.build_network_architecture(self.plans_manager, self.dataset_json, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 399, in build_network_architecture model = network_class( File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU.py", line 48, in init self.encoder = NexToU_Encoder(input_channels, patch_size, n_stages, features_per_stage, conv_op, kernel_sizes, strides, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 142, in init Efficient_ViG_blocks(features_per_stage[s], img_shape_list[s], s - conv_layer_d_num, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 1023, in init PoolGrapher(channels, img_shape, k[i], min(idx // 4 + 1, max_dilation), conv, act, norm, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 870, in init self.graph_conv = PoolDyGraphConv(in_channels, in_channels 2, kernel_size, dilation, conv, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 494, in init super(PoolDyGraphConv, self).init(in_channels, out_channels, conv, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 434, in init self.gconv = MRConv(in_channels, out_channels, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 402, in init self.nn = BasicConv([in_channels 2, out_channels], act=act, norm=norm, bias=bias, drop=0., conv_op=conv_op, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/torch_nn.py", line 85, in init m.append(conv_op(channels[i - 1], channels[i], 1, bias=bias, groups=self.groups_num)) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 591, in init super().init( File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 90, in init** raise ValueError('in_channels must be divisible by groups') ValueError: in_channels must be divisible by groups

2."omega = np.arange(embed_dim // 2, dtype=np.float)" in pos_embed leads to mistakes like File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 910, in init relative_pos_tensor = torch.from_numpy(np.float32(get_3d_relative_pos_embed(in_channels, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 38, in get_3d_relative_pos_embed pos_embed = get_3d_sincos_pos_embed(embed_dim, grid_size) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 78, in get_3d_sincos_pos_embed pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 99, in get_3d_sincos_pos_embed_from_grid emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 3, grid[0]) # (H_W_D, Dim/3) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 112, in get_1d_sincos_pos_embed_from_grid omega = np.arange(embed_dim // 2, dtype=np.float) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' So i change to float16 and it works good

It appears to be related to the number of groups in the 3D group convolution. NexToU currently uses a value of 6. You might want to experiment with other values like 8. Reference code line: torch_nn.py.
The second issue is likely due to the version of NumPy you are using. Thank you for pointing it out; I have made the necessary changes. For further details, you can refer to this StackOverflow post.

Overflowu7 commented 1 year ago

You aren't necessarily required to consult a professional doctor to understand the "topological interaction relationship" of various anatomical classes. You can use 3D visualization software like ITK-Snap to understand the spatial relationships of various anatomical areas on your own. This will allow you to grasp their topological interaction relationship. When it comes to organizing these classes onto a binary tree infrastructure, this refers to the design of these topological relationships in a binary tree format. This essentially creates a "map" of these relationships, logically dictating which anatomical classes "include" or "exclude" others, hence the terms "inclusion_list" and "exclusion_list". In the context of this problem statement, the paper NexToU could serve as a valuable resource to understand this further. Particularly, the supplementary material's figures 4 and 5 illustrate how these lists are built in a binary tree format.

Note that changes to the run_training.py file are generally not needed. It appears that you've modified the nnUNetTrainer classname but are experiencing some errors. To successfully update the classname, please follow the example given in this nnUNetTrainer implementation.

If you're planning to use the NexToU architecture, then the default 3d_fullres or 3d_lowres settings will not be sufficient due to the requirement of channel divisibility by 3, as stated in the NexToU codebase (pos_embed.py). For NexToU, the paper suggests "UNet_base_num_features": 24, "unet_max_num_features": 312.

Well ,Thank you for your valuable advice.I haven't touched graph networks and topology before, I'll do my best to figure him out And does files like 3d_fullres_nextou have the same implement way like nnUNetrainer_NexToU?

Overflowu7 commented 1 year ago

In addition,I also found some mistakes that were unique to my environment. 1.If the I set params in model like ""features_per_stage=[min(configuration_manager.UNet_base_num_features * 2 i,configuration_manager.unet_max_num_features) for i in range(num_stages)]"",the errors will happend.But I set features_per_stage = 24 ,It can works. I guess this error occurs on different patch_size because the dims in patch_size usually can only be divisible by either 2 or 3, not both. Traceback (most recent call last): File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 285, in run_training_entry() File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 279, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1365, in run_training self.on_train_start() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 920, in on_train_start self.initialize() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 235, in initialize self.network = self.build_network_architecture(self.plans_manager, self.dataset_json, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 399, in build_network_architecture model = network_class( File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU.py", line 48, in init self.encoder = NexToU_Encoder(input_channels, patch_size, n_stages, features_per_stage, conv_op, kernel_sizes, strides, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 142, in init Efficient_ViG_blocks(features_per_stage[s], img_shape_list[s], s - conv_layer_d_num, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 1023, in init PoolGrapher(channels, img_shape, k[i], min(idx // 4 + 1, max_dilation), conv, act, norm, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 870, in init self.graph_conv = PoolDyGraphConv(in_channels, in_channels 2, kernel_size, dilation, conv, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 494, in init super(PoolDyGraphConv, self).init(in_channels, out_channels, conv, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 434, in init self.gconv = MRConv(in_channels, out_channels, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 402, in init self.nn = BasicConv([in_channels 2, out_channels], act=act, norm=norm, bias=bias, drop=0., conv_op=conv_op, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/torch_nn.py", line 85, in init m.append(conv_op(channels[i - 1], channels[i], 1, bias=bias, groups=self.groups_num)) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 591, in init super().init( File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 90, in init raise ValueError('in_channels must be divisible by groups') ValueError: in_channels must be divisible by groups 2."omega = np.arange(embed_dim // 2, dtype=np.float)" in pos_embed leads to mistakes like File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 910, in init relative_pos_tensor = torch.from_numpy(np.float32(get_3d_relative_pos_embed(in_channels, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 38, in get_3d_relative_pos_embed pos_embed = get_3d_sincos_pos_embed(embed_dim, grid_size) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 78, in get_3d_sincos_pos_embed pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 99, in get_3d_sincos_pos_embed_from_grid emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 3, grid[0]) # (H_W_D, Dim/3) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 112, in get_1d_sincos_pos_embed_from_grid omega = np.arange(embed_dim // 2, dtype=np.float) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr** raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' So i change to float16 and it works good

It appears to be related to the number of groups in the 3D group convolution. NexToU currently uses a value of 6. You might want to experiment with other values like 8. Reference code line: torch_nn.py.

The second issue is likely due to the version of NumPy you are using. Thank you for pointing it out; I have made the necessary changes. For further details, you can refer to this StackOverflow post.

Thank you bro. Unfortunately, the value must be divisible by 3, so 8 doesn't seem to work. I've tried other values and the same problem occurs, I'll do some debugging later to see what the problem is.

PengchengShi1220 commented 1 year ago

You aren't necessarily required to consult a professional doctor to understand the "topological interaction relationship" of various anatomical classes. You can use 3D visualization software like ITK-Snap to understand the spatial relationships of various anatomical areas on your own. This will allow you to grasp their topological interaction relationship. When it comes to organizing these classes onto a binary tree infrastructure, this refers to the design of these topological relationships in a binary tree format. This essentially creates a "map" of these relationships, logically dictating which anatomical classes "include" or "exclude" others, hence the terms "inclusion_list" and "exclusion_list". In the context of this problem statement, the paper NexToU could serve as a valuable resource to understand this further. Particularly, the supplementary material's figures 4 and 5 illustrate how these lists are built in a binary tree format.

Note that changes to the run_training.py file are generally not needed. It appears that you've modified the nnUNetTrainer classname but are experiencing some errors. To successfully update the classname, please follow the example given in this nnUNetTrainer implementation.

If you're planning to use the NexToU architecture, then the default 3d_fullres or 3d_lowres settings will not be sufficient due to the requirement of channel divisibility by 3, as stated in the NexToU codebase (pos_embed.py). For NexToU, the paper suggests "UNet_base_num_features": 24, "unet_max_num_features": 312.

Well ,Thank you for your valuable advice.I haven't touched graph networks and topology before, I'll do my best to figure him out And does files like 3d_fullres_nextou have the same implement way like nnUNetrainer_NexToU?

You can add the following JSON snippet to your existing nnUNetPlans.json. Have a look at the example provided in the following link: nnUNetPlans.json:

"3d_fullres_nextou": {
    "inherits_from": "3d_fullres",
    "UNet_base_num_features": 24,
    "unet_max_num_features": 312
}

PengchengShi1220 commented 1 year ago

In addition,I also found some mistakes that were unique to my environment. 1.If the I set params in model like ""features_per_stage=[min(configuration_manager.UNet_base_num_features * 2 i,configuration_manager.unet_max_num_features) for i in range(num_stages)]"",the errors will happend.But I set features_per_stage = 24 ,It can works. I guess this error occurs on different patch_size because the dims in patch_size usually can only be divisible by either 2 or 3, not both. Traceback (most recent call last): File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 285, in run_training_entry() File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 279, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1365, in run_training self.on_train_start() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 920, in on_train_start self.initialize() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 235, in initialize self.network = self.build_network_architecture(self.plans_manager, self.dataset_json, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 399, in build_network_architecture model = network_class( File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU.py", line 48, in init self.encoder = NexToU_Encoder(input_channels, patch_size, n_stages, features_per_stage, conv_op, kernel_sizes, strides, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 142, in init Efficient_ViG_blocks(features_per_stage[s], img_shape_list[s], s - conv_layer_d_num, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 1023, in init PoolGrapher(channels, img_shape, k[i], min(idx // 4 + 1, max_dilation), conv, act, norm, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 870, in init self.graph_conv = PoolDyGraphConv(in_channels, in_channels 2, kernel_size, dilation, conv, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 494, in init super(PoolDyGraphConv, self).init(in_channels, out_channels, conv, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 434, in init self.gconv = MRConv(in_channels, out_channels, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 402, in init self.nn = BasicConv([in_channels 2, out_channels], act=act, norm=norm, bias=bias, drop=0., conv_op=conv_op, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/torch_nn.py", line 85, in init m.append(conv_op(channels[i - 1], channels[i], 1, bias=bias, groups=self.groups_num)) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 591, in init super().init( File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 90, in init raise ValueError('in_channels must be divisible by groups') ValueError: in_channels must be divisible by groups 2."omega = np.arange(embed_dim // 2, dtype=np.float)" in pos_embed leads to mistakes like File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 910, in init relative_pos_tensor = torch.from_numpy(np.float32(get_3d_relative_pos_embed(in_channels, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 38, in get_3d_relative_pos_embed pos_embed = get_3d_sincos_pos_embed(embed_dim, grid_size) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 78, in get_3d_sincos_pos_embed pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 99, in get_3d_sincos_pos_embed_from_grid emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 3, grid[0]) # (H_W_D, Dim/3) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 112, in get_1d_sincos_pos_embed_from_grid omega = np.arange(embed_dim // 2, dtype=np.float) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr** raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' So i change to float16 and it works good

It appears to be related to the number of groups in the 3D group convolution. NexToU currently uses a value of 6. You might want to experiment with other values like 8. Reference code line: torch_nn.py.

The second issue is likely due to the version of NumPy you are using. Thank you for pointing it out; I have made the necessary changes. For further details, you can refer to this StackOverflow post.

Thank you bro. Unfortunately, the value must be divisible by 3, so 8 doesn't seem to work. I've tried other values and the same problem occurs, I'll do some debugging later to see what the problem is.

The issue you encountered appears to be triggered primarily by pos_embed.py. It is recommended to adjust the numerical values of "UNet_base_num_features" to 24 and "unet_max_num_features" to 312 in the either default 3d_fullres or 3d_lowres settings.

Overflowu7 commented 1 year ago

In addition,I also found some mistakes that were unique to my environment. 1.If the I set params in model like ""features_per_stage=[min(configuration_manager.UNet_base_num_features * 2 i,configuration_manager.unet_max_num_features) for i in range(num_stages)]"",the errors will happend.But I set features_per_stage = 24 ,It can works. I guess this error occurs on different patch_size because the dims in patch_size usually can only be divisible by either 2 or 3, not both. Traceback (most recent call last): File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 285, in run_training_entry() File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 279, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/wu/wyc/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training nnunet_trainer.run_training() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1365, in run_training self.on_train_start() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 920, in on_train_start self.initialize() File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 235, in initialize self.network = self.build_network_architecture(self.plans_manager, self.dataset_json, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 399, in build_network_architecture model = network_class( File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU.py", line 48, in init self.encoder = NexToU_Encoder(input_channels, patch_size, n_stages, features_per_stage, conv_op, kernel_sizes, strides, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 142, in init Efficient_ViG_blocks(features_per_stage[s], img_shape_list[s], s - conv_layer_d_num, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 1023, in init PoolGrapher(channels, img_shape, k[i], min(idx // 4 + 1, max_dilation), conv, act, norm, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 870, in init self.graph_conv = PoolDyGraphConv(in_channels, in_channels 2, kernel_size, dilation, conv, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 494, in init super(PoolDyGraphConv, self).init(in_channels, out_channels, conv, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 434, in init self.gconv = MRConv(in_channels, out_channels, act, norm, bias, conv_op, dropout_op) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 402, in init self.nn = BasicConv([in_channels 2, out_channels], act=act, norm=norm, bias=bias, drop=0., conv_op=conv_op, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/torch_nn.py", line 85, in init m.append(conv_op(channels[i - 1], channels[i], 1, bias=bias, groups=self.groups_num)) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 591, in init super().init( File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 90, in init raise ValueError('in_channels must be divisible by groups') ValueError: in_channels must be divisible by groups 2."omega = np.arange(embed_dim // 2, dtype=np.float)" in pos_embed leads to mistakes like File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/NexToU_Encoder_Decoder.py", line 910, in init relative_pos_tensor = torch.from_numpy(np.float32(get_3d_relative_pos_embed(in_channels, File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 38, in get_3d_relative_pos_embed pos_embed = get_3d_sincos_pos_embed(embed_dim, grid_size) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 78, in get_3d_sincos_pos_embed pos_embed = get_3d_sincos_pos_embed_from_grid(embed_dim, grid) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 99, in get_3d_sincos_pos_embed_from_grid emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 3, grid[0]) # (H_W_D, Dim/3) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/network_architecture/pos_embed.py", line 112, in get_1d_sincos_pos_embed_from_grid omega = np.arange(embed_dim // 2, dtype=np.float) File "/home/wu/.conda/envs/nnUNet/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr** raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float' So i change to float16 and it works good

It appears to be related to the number of groups in the 3D group convolution. NexToU currently uses a value of 6. You might want to experiment with other values like 8. Reference code line: torch_nn.py.

The second issue is likely due to the version of NumPy you are using. Thank you for pointing it out; I have made the necessary changes. For further details, you can refer to this StackOverflow post.

Thank you bro. Unfortunately, the value must be divisible by 3, so 8 doesn't seem to work. I've tried other values and the same problem occurs, I'll do some debugging later to see what the problem is.

The issue you encountered appears to be triggered primarily by pos_embed.py. It is recommended to adjust the numerical values of "UNet_base_num_features" to 24 and "unet_max_num_features" to 312 in the either default 3d_fullres or 3d_lowres settings.

The issue still exists after I change the UNet_base_num_features" to 24 and "unet_max_num_features" to 312 ,that why it makes me confused.Thank you for you help , I will check my data and code.

Overflowu7 commented 1 year ago

Hi bro, I've been having some problems lately, and I'd like to ask you about them. If I want to change BTI Module to TI Module in bti_loss.py, but I still get an error just by replacing the comment in it： File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/loss_functions/bti_loss.py", line 145, in forward critical_voxels_map = self.binary_topological_interaction_module(P) File "/home/wu/wyc/nnUNet/nnunetv2/training/nnUNetTrainer/SwinMM/Neo/loss_functions/bti_loss.py", line 97, in binary_topological_interaction_module mask_A = torch.where(P == label_A, 1.0, 0.0).double() # TI module RuntimeError: The size of tensor a (160) must match the size of tensor b (3) at non-singleton dimension 4 And I found that the problem was with old exclusion_list in BTI module, so I changed it to exclusion_list=[[1, 2, 3, 4]], and the error disappeared. So I want to ask specifically, if I use the TI module, how should I set up the inclusion_list and exclusion_list respectively to get better results?

PengchengShi1220 commented 1 year ago

Thanks for your query. I have implemented the TI_loss process, which first involves deriving "all pairwise combinations of foreground classes". You can find the specific code details in these links: nnUNetTrainer_nextou_ti.py and TI_Loss.py