askerlee / segtran

Medical Image Segmentation using Squeeze-and-Expansion Transformers
216 stars 49 forks source link

Great project! But I encountered some problems about test #35

Closed Lemonweier closed 2 years ago

Lemonweier commented 2 years ago

When I try test2d.py, the error occured: python3 test2d.py --task fundus --split all --ds valid2 --net segtran --bb resnet101 --translayers 3 --layercompress 1,1,2,2 --cpdir ../model/segtran-fundus-train,valid,test,drishti,rim-05011826 --iters 9500 --outorigsize 'fundus' mean/std loaded from 'fundus-cropped-gray0.5-stats.json' 'all' 400 samples of size 576 chosen (total 800) in '../data/fundus/valid2' 'args' orig in-feat: 2048, in-feat: 2048, out-feat: 512, in-scheme: AN, out-scheme: AN, translayer_dims: [2048, 2048, 1024, 512] Namespace(ablate_multihead=False, attn_clip=500, backbone_type='resnet101', batch_size=8, bb_feat_upsize=True, binarize=False, calc_flop=False, checkpoint_dir='../model/segtran-fundus-train,valid,test,drishti,rim-05011826', debug=False, device='cuda', do_remove_frag=False, ds_class='SegCrop', ds_name='valid2', ds_split='all', eval_robustness=False, gpu='0', gray_alpha=0.5, has_FFN_in_squeeze=False, in_fpn_layers='34', in_fpn_scheme='AN', in_fpn_use_bn=False, iters='9500', job_name='fundus-valid2', mean=[0.578, 0.429, 0.318], mid_type='shared', mince_channel_props=None, mince_scales=None, net='segtran', num_attractors=256, num_classes=3, num_modalities=0, num_modes=4, num_translayers=3, num_workers=4, orig_input_size=(576, 576), out_fpn_layers='1234', out_fpn_scheme='AN', out_origsize=True, output_upscale=2.0, patch_size=(288, 288), polyformer_mode=None, pos_bias_radius=7, pos_code_type='lsinu', pos_code_weight=1.0, qk_have_bias=True, reload_mask=False, reshape_mask_type=None, robust_aug_degrees=[0.5, 1.5], robust_aug_types=None, robust_ref_cp_path=None, robust_sample_num=120, robustness_augs=None, sample_num=-1, save_ext='png', save_features_img_count=0, save_results=True, std=[0.184, 0.162, 0.144], task_name='fundus', test_interp=None, tie_qk_scheme='none', trans_output_type='private', translayer_compress_ratios=[1.0, 1.0, 2.0, 2.0], use_exclusive_masks=False, use_global_bias=False, use_mince_transformer=False, use_pretrained=True, use_squeezed_transformer=True, verbose_output=False, vis_layers=None, vis_mode=None) Segtran Fusion Encoder with 3 trans-layers Learnable Sinusoidal positional encoding Fusion0-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion0-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion0-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion0-squeeze-out in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion1-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion1-squeeze-out in_feat_dim: 2048, feat_dim: 1024, qk_have_bias: True Fusion2-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion2-in-squeeze in_feat_dim: 1024, feat_dim: 1024, qk_have_bias: True Fusion2-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion2-squeeze-out in_feat_dim: 1024, feat_dim: 512, qk_have_bias: True Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /root/.cache/torch/hub/checkpoints/resnet101-5d3b4d8f.pth **resnet101 created** Parameter Count: 172737073 **args[backbone_type]=resnet101, checkpoint args[backbone_type]=eff-b4, inconsistent!** I find in test2d:“parser.add_argument('--bb', dest='backbone_type', type=str, default='eff-b4', help='Segtran backbone'” then find in resnet.py,when test2d.py is running,it downloads from the web101. But it has this error, I don't know how to solve this,Could you tell me how to solve this problem? thank u!

askerlee commented 2 years ago

This means the checkpoint was using eff-b4 as the backbone, but during test time, you specified to use resnet101 as the backbone. You can train a model with resnet101 backbone, and then do the test above.

Lemonweier commented 2 years ago

I will try . Thank u for your patient explanation!

Lemonweier commented 2 years ago

I did what you explained. Following your mentioned REFUGE training command line (with --noqkbias the trained model performed slightly better): so, when I train the model, I used bash ./train2d.sh --task fundus --split all --translayers 3 --layercompress 1,1,2,2 --net segtran --bb resnet101 --maxiter 10000 --bs 6 --**noqkbias** Then used python3 [test2d.py](http://test2d.py/) --task fundus --split all --ds valid2 --net segtran --bb resnet101 --translayers 3 --layercompress 1,1,2,2 --cpdir ../model/segtran-fundus-train,valid,test,drishti,rim-05101448 --iters 7000 --outorigsize something I don't understand, can you help?Thank u! 'fundus' mean/std loaded from 'fundus-cropped-gray0.5-stats.json' 'all' 400 samples of size 576 chosen (total 800) in '../data/fundus/valid2' 'args' orig in-feat: 2048, in-feat: 2048, out-feat: 512, in-scheme: AN, out-scheme: AN, translayer_dims: [2048, 2048, 1024, 512] Namespace(ablate_multihead=False, attn_clip=500, backbone_type='resnet101', batch_size=8, bb_feat_upsize=True, binarize=False, calc_flop=False, checkpoint_dir='../model/segtran-fundus-train,valid,test,drishti,rim-05031754', debug=False, device='cuda', do_remove_frag=False, ds_class='SegCrop', ds_name='valid2', ds_split='all', eval_robustness=False, gpu='0', gray_alpha=0.5, has_FFN_in_squeeze=False, in_fpn_layers='34', in_fpn_scheme='AN', in_fpn_use_bn=False, iters='5000', job_name='fundus-valid2', mean=[0.578, 0.429, 0.318], mid_type='shared', mince_channel_props=None, mince_scales=None, net='segtran', num_attractors=256, num_classes=3, num_modalities=0, num_modes=4, num_translayers=3, num_workers=4, orig_input_size=(576, 576), out_fpn_layers='1234', out_fpn_scheme='AN', out_origsize=True, output_upscale=2.0, patch_size=(288, 288), polyformer_mode=None, pos_bias_radius=7, pos_code_type='lsinu', pos_code_weight=1.0, qk_have_bias=True, reload_mask=False, reshape_mask_type=None, robust_aug_degrees=[0.5, 1.5], robust_aug_types=None, robust_ref_cp_path=None, robust_sample_num=120, robustness_augs=None, sample_num=-1, save_ext='png', save_features_img_count=0, save_results=True, std=[0.184, 0.162, 0.144], task_name='fundus', test_interp=None, tie_qk_scheme='none', trans_output_type='private', translayer_compress_ratios=[1.0, 1.0, 2.0, 2.0], use_exclusive_masks=False, use_global_bias=False, use_mince_transformer=False, use_pretrained=True, use_squeezed_transformer=True, verbose_output=False, vis_layers=None, vis_mode=None) Segtran Fusion Encoder with 3 trans-layers Learnable Sinusoidal positional encoding Fusion0-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion0-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion0-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion0-squeeze-out in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion1-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion1-squeeze-out in_feat_dim: 2048, feat_dim: 1024, qk_have_bias: True Fusion2-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion2-in-squeeze in_feat_dim: 1024, feat_dim: 1024, qk_have_bias: True Fusion2-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion2-squeeze-out in_feat_dim: 1024, feat_dim: 512, qk_have_bias: True resnet101 created Parameter Count: 172737073 **args[qk_have_bias]=True, checkpoint args[qk_have_bias]=False, inconsistent!** Process finished with exit code 0

askerlee commented 2 years ago

If you specify --noqkbias during training, you have to also specify --noqkbias during test. This check is to make sure the test settings and training settings are consistent.

Lemonweier commented 2 years ago

It works. It's my first time to run code. It's my honor to meet you,thank u ~

askerlee commented 2 years ago

It's a pleasure to help 😃

Lemonweier commented 2 years ago

I have something want to share with you. After your guidance, I used restnet101 as bb, then test them. I find a nice result. Do not sure whether it is true. Try twice, and decide to try more. At test moment, checkpoint 10000 went with class1 :dice=0.968 class2 with0.936 average dice :0.952,and some result in 7500 also too. It showed good accuracy, how to treat this thing? Did you meet this before? Looking foward to your reply.Thank u!

askerlee commented 2 years ago

Oh what's the dataset you used for test? The accuray could vary greatly across datasets. For example on REFUGE2, cup dice score is very hard to be above 0.9.

Lemonweier commented 2 years ago

I used the dataset on the your reply of the issues with Baidu Cloud Link. The name is valid in refuge2020. And test in it. Both of them over 0.9. Yesterday I trained the model set bb=eff-b4, the performance of the model most over 0.9. If put it what you said, maybe there is something wrong when I used the code? I show the process of mine here. First ,train the modelbash ./train2d.sh --task fundus --split all --translayers 3 --layercompress 1,1,2,2 --net segtran --bb resnet101 --maxiter 10000 --bs 6 --noqkbias then,test or valid it python3 [test2d.py](http://test2d.py/) --task fundus --split all --ds test --net segtran --bb resnet101 --translayers 3 --layercompress 1,1,2,2 --cpdir ../model/segtran-fundus-train,valid,test,drishti,rim-05041012 --iters 10000 --outorigsize --noqkbias Here are some pictures ,I recorded . Thank u! https://sumptuous-pawpaw-08b.notion.site/train-val-test-fe0a7a2c8fb14cd29c9741f25f2d969b

askerlee commented 2 years ago

I see. the "valid" dataset is actually REFUGE 2019 validation data. It's added to REFUGE 2020 challenge as part of the training data. For REFUGE 2020 validation, you need to use "valid2" and then submit the obtained masks to the evaluation server: https://refuge.grand-challenge.org/

Lemonweier commented 2 years ago

Ok,thank u. And I discovered the parameter I set when training--split all, it need to set --split train?

askerlee commented 2 years ago

This is because for REFUGE 2020, "all" means using the full training sets, including the whole "valid" dataset, which just as I said, is REFUGE 2019 validation data (but included in REFUGE 2020 training data). If you use --split train, you are only using part of the training data.

Lemonweier commented 2 years ago

Thank you for your guidance. I will read your paper again for more information.

Lemonweier commented 1 year ago

This is because for REFUGE 2020, "all" means using the full training sets, including the whole "valid" dataset, which just as I said, is REFUGE 2019 validation data (but included in REFUGE 2020 training data). If you use --split train, you are only using part of the training data.

Hi, we used your project for the Brats datasets, do we set --split all to --split train? at the training stage. Thank u!

askerlee commented 1 year ago

It's been a while and my memory is a bit rusty. But I think you should use all. The evaluation is done by the evaluation server on some held-out data.

Lemonweier commented 1 year ago

Thank you.

Lemonweier commented 1 year ago

Do you still pay attention to this competition? I encountered a problem when evaluating the accuracy of the model, the official website seems to have closed the upload location of the year.If you know, look forward for your reply. .

askerlee commented 1 year ago

Are you visiting https://refuge.grand-challenge.org/ ? It seems someone was still uploading results a few days ago: https://refuge.grand-challenge.org/evaluation/challenge/leaderboard/

But it requires manual approval by the competition admins first.

Lemonweier commented 1 year ago

Thank you for the replay.