Haochen-Wang409 / U2PL

[CVPR'22 & IJCV'24] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels & Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation
Apache License 2.0
436 stars 61 forks source link

how to resume the program from the checkpoint #108

Closed victorshengwgit closed 1 year ago

victorshengwgit commented 1 year ago

第一次使用多GPU,我要怎样修改才能断点继续跑呀 Snipaste_2023-03-03_09-30-40

Haochen-Wang409 commented 1 year ago

config.yaml 中设置 auto_resume: True 即可。 默认情况下我们已经打开了自动恢复

victorshengwgit commented 1 year ago

确实, 阿里嘎多!

victorshengwgit commented 1 year ago

您好,麻烦问下在进行 train-smi.py 前 是否都需要进行suponly 后再进行半监督训练?

Haochen-Wang409 commented 1 year ago

您可以根据自己的实验结果灵活调整

victorshengwgit commented 1 year ago

那麻烦再问下,我三类别分割任务中,为什么我训练 suponly 和semi 中的iou 除了背景都是0。我应该做怎样的修改(semi中的low rank = 1, high_rank=2)

Haochen-Wang409 commented 1 year ago

可以检查检查数据,先把 suponly 的结果调整正确

victorshengwgit commented 1 year ago

您好,我的数据集是三分类,在resnet101中 class 1,class 2 的IoU都比较小,想尝试您代码中resnet18或其他结构,但是遇到如下问题,不知应该修改哪。期待您的回复! snipaste_20230316_104427

Haochen-Wang409 commented 1 year ago

是否能提供 ResNet-18 的 config?

victorshengwgit commented 1 year ago

dataset: # Required. type: ukbb train: data_root: ./data/UKB_DATASETS data_list: ./data/splits/ukbb/16/labeled.txt flip: True GaussianBlur: False rand_resize: [0.5, 2.0]

rand_rotation: [-10.0, 10.0]

crop:
  type: rand
  size: [769, 769] # crop image with HxW size

val: data_root: ./data/UKB_DATASETS data_list: ./data/splits/ukbb/val.txt crop: type: center size: [769, 769] # crop image with HxW size batch_size: 2 n_sup: 16 noise_std: 0.1 workers: 1

mean,std may need to modify

mean: [123.675, 116.28, 103.53] std: [58.395, 57.12, 57.375] ignore_label: 255

trainer: # Required. epochs: 400 #ori:200 start_epochs: 0 eval_on: True optimizer: type: SGD kwargs: lr: 0.005 # 4GPUs momentum: 0.9 weight_decay: 0.0005 lr_scheduler: mode: poly kwargs: power: 0.9

saver: auto_resume: True snapshot_dir: checkpoints pretrain: ''

criterion: type: ohem kwargs: thresh: 0.7 min_kept: 100000

net: # Required. num_classes: 3 sync_bn: False ema_decay: 0.99 aux_loss: aux_plane: 1024 loss_weight: 0.4 encoder: type: u2pl.models.resnet.resnet18 kwargs: multi_grid: True zero_init_residual: True fpn: True replace_stride_with_dilation: [False, True, True] #layer0...1 is fixed, layer2...4 decoder: type: u2pl.models.decoder.dec_deeplabv3_plus kwargs: rep_head: False inner_planes: 256 dilations: [12, 24, 36]

Haochen-Wang409 commented 1 year ago

您可以尝试设置 replace_stride_with_dilation: [False, False, False] 因为 ResNet-18 和 ResNet-34 是基于 BasicBlock,详见:https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L323 而其他是基于 Bottleneck,详见:https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L367

下图是 ResNet 原文的网络结构,前两者是没有 bottleneck 结构的

image

如果您想在 ResNet-18 或 ResNet-34 上实现膨胀卷积,可以模仿 Bottleneck 中的实现方式进行修改。

victorshengwgit commented 1 year ago

其实我尝试过全False,报错一样。我去研究下这个resnet怎样修改吧

victorshengwgit commented 1 year ago

麻烦问下,您使用空洞卷积替换“replace_stride_with_dilation: [False, True, True] #layer0...1 is fixed, layer2...4” 这一句的作用是啥呀?

Haochen-Wang409 commented 1 year ago

原版 ResNet 最终输出的 stride 是 32,即输入一张 HxW 的图,输出是 (H/32)x(W/32)。如此小尺度的特征图对于分割任务并不友好,因此我们把 stride 替换为了膨胀卷积,最终的输出尺度是 (H/8)x(W/8)

代码详见:https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L193-L207

yjq767579182 commented 1 year ago

你好 我在config.yaml文件中没有发现auto_resume的选项

Haochen-Wang409 commented 1 year ago

saver 下面添加 auto_resume: True 即可

gufan-d commented 1 year ago

其实我尝试过全False,报错一样。我去研究下这个resnet怎样修改吧

您好,请问您修改成功了吗?