PPPW / deep-learning-random-explore

194 stars 34 forks source link

unet_learner giving error with seresnext101 #8

Closed saurabh502 closed 5 years ago

saurabh502 commented 5 years ago

Hi, When I am trying to use unet_learner of fastai with se_resnext101_32x4d I am getting below error: `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last)

in 1 # Fit one cycle of 6 epochs with max lr of 1e-3 ----> 2 learn.fit_one_cycle(6) /opt/conda/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch) 20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start, 21 final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch)) ---> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks) 23 24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None): /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks) 198 callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks) 199 if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks --> 200 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks) 201 202 def create_opt(self, lr:Floats, wd:Floats=0.)->None: /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics) 99 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar): 100 xb, yb = cb_handler.on_batch_begin(xb, yb) --> 101 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler) 102 if cb_handler.on_batch_end(loss): break 103 /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler) 24 if not is_listy(xb): xb = [xb] 25 if not is_listy(yb): yb = [yb] ---> 26 out = model(*xb) 27 out = cb_handler.on_loss_begin(out) 28 /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 491 result = self._slow_forward(*input, **kwargs) 492 else: --> 493 result = self.forward(*input, **kwargs) 494 for hook in self._forward_hooks.values(): 495 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/fastai/layers.py in forward(self, x) 134 for l in self.layers: 135 res.orig = x --> 136 nres = l(res) 137 # We have to remove res.orig to avoid hanging refs and therefore memory leaks 138 res.orig = None /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 491 result = self._slow_forward(*input, **kwargs) 492 else: --> 493 result = self.forward(*input, **kwargs) 494 for hook in self._forward_hooks.values(): 495 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/fastai/layers.py in forward(self, x) 148 "Merge a shortcut with the result of the module by adding them or concatenating thme if `dense=True`." 149 def __init__(self, dense:bool=False): self.dense=dense --> 150 def forward(self, x): return torch.cat([x,x.orig], dim=1) if self.dense else (x+x.orig) 151 152 def res_block(nf, dense:bool=False, norm_type:Optional[NormType]=NormType.Batch, bottle:bool=False, **conv_kwargs): RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 256 and 128 in dimension 2 at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THC/generic/THCTensorMath.cu:71 ` Below is the code I am using: `SZ = 256 BS=16 MODEL_PATH = "/kaggle/working/" data = (SegmentationItemList.from_folder(path=path/'train') .split_by_rand_pct(0.2) .label_from_func(lambda x : str(x).replace('train', 'masks'), classes=[0, 1]) .add_test((path/'test').ls(), label=None) .transform(get_transforms(), size=SZ, tfm_y=True) .databunch(path=Path('.'), bs=BS) .normalize(imagenet_stats)) def se_resnext101(pretrained=True): pretrained = 'imagenet' if pretrained else None model = pretrainedmodels.__dict__['se_resnext101_32x4d'](pretrained=None) model.load_state_dict(torch.load("../input/preptrainedmode-weight/se_resnext101_32x4d-3b2fe3d8.pth")) return model learn = unet_learner(data, se_resnext101, pretrained=True,path=MODEL_PATH, cut=-2, split_on=lambda m: (m[0][4], m[1]),metrics=[dice]) learn.fit_one_cycle(6) # this line is giving above error` Thanks in advance for your help!
PPPW commented 5 years ago

Hi @saurabh502, The fastai's DynamicUnet has some hardcoded part, you can request them to make it work for general cases.

In the DynamicUnet, the base arch downsamples, and the UnetBlock upsamples. For se_resnext101_32x4d, if the input size is 256, it's downsampled to 8, if each step halves it, then it's 4 steps (128, 64, 32, 16) in between. Then the UnetBlock upsamples it, in this case, we need four UnetBlock's, however, fastai only adds three.

Why? Each UnetBlock upsamples by a factor of 2, and the current fastai logic thinks in the base arch, each time it downsamples by a factor of 2, but that's not the case for Cadene's se_resnext101_32x4d, in which the layer0 reduced the input size by a factor of 4.

You can see it by looking at the sfs_szs (second line in the __init__ of DynamicUnet). For se_resnext101_32x4d, the sfs_szs is [64, 64, 32, 16, 8]; while for resnet18, it's [128, 128, 128, 64, 64, 32, 32, 16, 8]. You can see for the resnet18, if you have an input size of 256, the size is halved very time. The sfs_szs is used later to determine how many UnetBlock to include. The _get_sfs_idxs thinks only three UnetBlock is needed for se_resnext101_32x4d, then at the MergeLayer part mismatch occurs. If you downsample you need to upsample back, otherwise it can't merge with original.

You can also change the DynamicUnet code to work temporarily, but I think you may want to tell fastai about this.

saurabh502 commented 5 years ago

Thank you for your suggestions!