victorshengwgit commented 1 year ago

第一次使用多GPU，我要怎样修改才能断点继续跑呀 Snipaste_2023-03-03_09-30-40

Haochen-Wang409 commented 1 year ago

在 config.yaml 中设置 auto_resume: True 即可。默认情况下我们已经打开了自动恢复

victorshengwgit commented 1 year ago

确实，阿里嘎多！

victorshengwgit commented 1 year ago

您好，麻烦问下在进行 train-smi.py 前是否都需要进行suponly 后再进行半监督训练？

Haochen-Wang409 commented 1 year ago

您可以根据自己的实验结果灵活调整

victorshengwgit commented 1 year ago

那麻烦再问下，我三类别分割任务中，为什么我训练 suponly 和semi 中的iou 除了背景都是0。我应该做怎样的修改（semi中的low rank = 1， high_rank=2）

Haochen-Wang409 commented 1 year ago

可以检查检查数据，先把 suponly 的结果调整正确

victorshengwgit commented 1 year ago

您好，我的数据集是三分类，在resnet101中 class 1，class 2 的IoU都比较小，想尝试您代码中resnet18或其他结构，但是遇到如下问题，不知应该修改哪。期待您的回复！ snipaste_20230316_104427

Haochen-Wang409 commented 1 year ago

是否能提供 ResNet-18 的 config？

victorshengwgit commented 1 year ago

dataset: # Required. type: ukbb train: data_root: ./data/UKB_DATASETS data_list: ./data/splits/ukbb/16/labeled.txt flip: True GaussianBlur: False rand_resize: [0.5, 2.0]

rand_rotation: [-10.0, 10.0]

crop:
  type: rand
  size: [769, 769] # crop image with HxW size

val: data_root: ./data/UKB_DATASETS data_list: ./data/splits/ukbb/val.txt crop: type: center size: [769, 769] # crop image with HxW size batch_size: 2 n_sup: 16 noise_std: 0.1 workers: 1

mean,std may need to modify

mean: [123.675, 116.28, 103.53] std: [58.395, 57.12, 57.375] ignore_label: 255

trainer: # Required. epochs: 400 #ori:200 start_epochs: 0 eval_on: True optimizer: type: SGD kwargs: lr: 0.005 # 4GPUs momentum: 0.9 weight_decay: 0.0005 lr_scheduler: mode: poly kwargs: power: 0.9

saver: auto_resume: True snapshot_dir: checkpoints pretrain: ''

criterion: type: ohem kwargs: thresh: 0.7 min_kept: 100000

net: # Required. num_classes: 3 sync_bn: False ema_decay: 0.99 aux_loss: aux_plane: 1024 loss_weight: 0.4 encoder: type: u2pl.models.resnet.resnet18 kwargs: multi_grid: True zero_init_residual: True fpn: True replace_stride_with_dilation: [False, True, True] #layer0...1 is fixed, layer2...4 decoder: type: u2pl.models.decoder.dec_deeplabv3_plus kwargs: rep_head: False inner_planes: 256 dilations: [12, 24, 36]

Haochen-Wang409 commented 1 year ago

您可以尝试设置 replace_stride_with_dilation: [False, False, False] 因为 ResNet-18 和 ResNet-34 是基于 BasicBlock，详见：https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L323 而其他是基于 Bottleneck，详见：https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L367

下图是 ResNet 原文的网络结构，前两者是没有 bottleneck 结构的

如果您想在 ResNet-18 或 ResNet-34 上实现膨胀卷积，可以模仿 Bottleneck 中的实现方式进行修改。

victorshengwgit commented 1 year ago

其实我尝试过全False，报错一样。我去研究下这个resnet怎样修改吧

victorshengwgit commented 1 year ago

麻烦问下，您使用空洞卷积替换“replace_stride_with_dilation: [False, True, True] #layer0...1 is fixed, layer2...4” 这一句的作用是啥呀？

Haochen-Wang409 commented 1 year ago

原版 ResNet 最终输出的 stride 是 32，即输入一张 HxW 的图，输出是 (H/32)x(W/32)。如此小尺度的特征图对于分割任务并不友好，因此我们把 stride 替换为了膨胀卷积，最终的输出尺度是 (H/8)x(W/8)

代码详见：https://github.com/Haochen-Wang409/U2PL/blob/main/u2pl/models/resnet.py#L193-L207

yjq767579182 commented 1 year ago

你好我在config.yaml文件中没有发现auto_resume的选项

Haochen-Wang409 commented 1 year ago

在 saver 下面添加 auto_resume: True 即可

gufan-d commented 1 year ago

其实我尝试过全False，报错一样。我去研究下这个resnet怎样修改吧

您好，请问您修改成功了吗？

Haochen-Wang409 / U2PL

how to resume the program from the checkpoint #108

rand_rotation: [-10.0, 10.0]

mean,std may need to modify