PPPW / deep-learning-random-explore

194 stars 34 forks source link

AssertionError when using nasnetalarge + fastai #2

Closed saurabh502 closed 5 years ago

saurabh502 commented 5 years ago

Hi,

When I am trying to use nasnetalarge I am getting AssertionError. Below is the code and error details.

Code

def nasnetalarge(pretrained=True): pretrained = 'imagenet' if pretrained else None model = pretrainedmodels.__dict__['nasnetalarge'](pretrained=pretrained) return model This is causing error: learn = create_cnn(data, nasnetalarge, pretrained=True,path=MODEL_PATH, cut=-2, split_on=lambda m: (m[0][4], m[1]))

Error: `AssertionError Traceback (most recent call last)

in () 1 learn = create_cnn(data, nasnetalarge, pretrained=True,path=MODEL_PATH, ----> 2 cut=-2, split_on=lambda m: (m[0][4], m[1])) /opt/conda/lib/python3.6/site-packages/fastai/vision/learner.py in create_cnn(data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, **kwargs) 53 "Build convnet style learners." 54 meta = cnn_config(arch) ---> 55 body = create_body(arch, pretrained, cut) 56 nf = num_features_model(body) * 2 57 head = custom_head or create_head(nf, data.c, lin_ftrs, ps=ps, bn_final=bn_final) /opt/conda/lib/python3.6/site-packages/fastai/vision/learner.py in create_body(arch, pretrained, cut, body_fn) 30 def create_body(arch:Callable, pretrained:bool=True, cut:Optional[int]=None, body_fn:Callable[[nn.Module],nn.Module]=None): 31 "Cut off the body of a typically pretrained `model` at `cut` or as specified by `body_fn`." ---> 32 model = arch(pretrained) 33 if not cut and not body_fn: cut = cnn_config(arch)['cut'] 34 return (nn.Sequential(*list(model.children())[:cut]) if cut in nasnetalarge(pretrained) 1 def nasnetalarge(pretrained=True): 2 pretrained = 'imagenet' if pretrained else None ----> 3 model = pretrainedmodels.__dict__['nasnetalarge'](pretrained=pretrained) 4 return model /opt/conda/lib/python3.6/site-packages/pretrainedmodels/models/nasnet.py in nasnetalarge(num_classes, pretrained) 613 settings = pretrained_settings['nasnetalarge'][pretrained] 614 assert num_classes == settings['num_classes'], \ --> 615 "num_classes should be {}, but is {}".format(settings['num_classes'], num_classes) 616 617 # both 'imagenet'&'imagenet+background' are loaded from same parameters AssertionError: num_classes should be 1000, but is 1001` On separate note,From where I can call Arch_summary? `arch_summary(resnet34)` I am getting error: `NameError Traceback (most recent call last) in () ----> 1 arch_summary(resnet34) NameError: name 'arch_summary' is not defined`
PPPW commented 5 years ago

Hi @saurabh502, In the nasnet.py of Cadene pretrainedmodels, the num_classes is set to 1001 in def nasnetalarge but in pretained_settings for "imagenet" it's set to 1000 😂.... So if you use pretrained = 'imagenet+background' instead of pretrained = 'imagenet' then you won't have this error, but looks like they got a bug.

However, I didn't include NASNet in here because we'll need to do some additional works to make it work with fastai. The Cadene implementation has some submodules like CellStem1 which takes more than 1 argument in forward. In fastai create_body, it takes the original model's children and converts to Sequential, which won't work with CellStem1. So we need to encapsulate the mudules in the Cadene implementation into blocks that take only 1 argument in forward, then we can make it work with fastai.

For the arch_summary, it's in the utils.py, did you import it?

PPPW commented 5 years ago

I have added an example for NASNet for reference.

saurabh502 commented 5 years ago

When using provided code I am getting below error:

Code: learn = create_cnn(data, pnasnet5large, pretrained=True,path=MODEL_PATH, cut=None, split= lambda m: (list(m[0][0].children())[8], m[1]))

Error: `--------------------------------------------------------------------------- Exception Traceback (most recent call last)

in () 1 learn = create_cnn(data, pnasnet5large, pretrained=True,path=MODEL_PATH, ----> 2 cut=None, split= lambda m: (list(m[0][0].children())[8], m[1])) /opt/conda/lib/python3.6/site-packages/fastai/vision/learner.py in create_cnn(data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, **learn_kwargs) 72 meta = cnn_config(arch) 73 body = create_body(arch, pretrained, cut) ---> 74 nf = num_features_model(body) * 2 75 head = custom_head or create_head(nf, data.c, lin_ftrs, ps=ps, bn_final=bn_final) 76 model = nn.Sequential(body, head) /opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in num_features_model(m) 117 sz = 64 118 while True: --> 119 try: return model_sizes(m, size=(sz,sz))[-1][1] 120 except Exception as e: 121 sz *= 2 /opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in model_sizes(m, size) 110 "Pass a dummy input through the model `m` to get the various sizes of activations." 111 with hook_outputs(m) as hooks: --> 112 x = dummy_eval(m, size) 113 return [o.stored.shape for o in hooks] 114 /opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in dummy_eval(m, size) 105 def dummy_eval(m:nn.Module, size:tuple=(64,64)): 106 "Pass a `dummy_batch` in evaluation mode in `m` with `size`." --> 107 return m.eval()(dummy_batch(m, size)) 108 109 def model_sizes(m:nn.Module, size:tuple=(64,64))->Tuple[Sizes,Tensor,Hooks]: /opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in dummy_batch(m, size) 100 def dummy_batch(m: nn.Module, size:tuple=(64,64))->Tensor: 101 "Create a dummy batch to go through `m` with `size`." --> 102 ch_in = in_channels(m) 103 return one_param(m).new(1, ch_in, *size).requires_grad_(False).uniform_(-1.,1.) 104 /opt/conda/lib/python3.6/site-packages/fastai/torch_core.py in in_channels(m) 248 for l in flatten_model(m): 249 if hasattr(l, 'weight'): return l.weight.shape[1] --> 250 raise Exception('No weight layer') 251 252 def calc_loss(y_pred:Tensor, y_true:Tensor, loss_func:LossFunction): Exception: No weight layer`
PPPW commented 5 years ago

Hi, the trick is we have to override the “model_meta”, otherwise in “create_body” your cut will be overriden. Can you try with exact the code in the notebook (you can set pretrained to true)?

saurabh502 commented 5 years ago

Hi, now I am not getting any error on learn = create_cnn(data, pnasnet5large, pretrained=False,path=MODEL_PATH)

but when I am running learn.lr_find()

I am getting below error: `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last)

in () ----> 1 learn.lr_find() /opt/conda/lib/python3.6/site-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, wd) 30 cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div) 31 a = int(np.ceil(num_it/len(learn.data.train_dl))) ---> 32 learn.fit(a, start_lr, callbacks=[cb], wd=wd) 33 34 def to_fp16(learn:Learner, loss_scale:float=None, max_noskip:int=1000, dynamic:bool=False, clip:float=None, /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks) 176 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks) 177 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics, --> 178 callbacks=self.callbacks+callbacks) 179 180 def create_opt(self, lr:Floats, wd:Floats=0.)->None: /opt/conda/lib/python3.6/site-packages/fastai/utils/mem.py in wrapper(*args, **kwargs) 78 79 try: ---> 80 return func(*args, **kwargs) 81 except Exception as e: 82 if ("CUDA out of memory" in str(e) or /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics) 88 for xb,yb in progress_bar(data.train_dl, parent=pbar): 89 xb, yb = cb_handler.on_batch_begin(xb, yb) ---> 90 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler) 91 if cb_handler.on_batch_end(loss): break 92 /opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler) 18 if not is_listy(xb): xb = [xb] 19 if not is_listy(yb): yb = [yb] ---> 20 out = model(*xb) 21 out = cb_handler.on_loss_begin(out) 22 /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input) 90 def forward(self, input): 91 for module in self._modules.values(): ---> 92 input = module(input) 93 return input 94 /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input) 90 def forward(self, input): 91 for module in self._modules.values(): ---> 92 input = module(input) 93 return input 94 /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/pretrainedmodels/models/pnasnet.py in forward(self, input) 365 366 def forward(self, input): --> 367 x = self.features(input) 368 x = self.logits(x) 369 return x /opt/conda/lib/python3.6/site-packages/pretrainedmodels/models/pnasnet.py in features(self, x) 346 x_cell_2 = self.cell_2(x_cell_0, x_cell_1) 347 x_cell_3 = self.cell_3(x_cell_1, x_cell_2) --> 348 x_cell_4 = self.cell_4(x_cell_2, x_cell_3) 349 x_cell_5 = self.cell_5(x_cell_3, x_cell_4) 350 x_cell_6 = self.cell_6(x_cell_4, x_cell_5) /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) /opt/conda/lib/python3.6/site-packages/pretrainedmodels/models/pnasnet.py in forward(self, x_left, x_right) 285 x_left = self.conv_prev_1x1(x_left) 286 x_right = self.conv_1x1(x_right) --> 287 x_out = self.cell_forward(x_left, x_right) 288 return x_out 289 /opt/conda/lib/python3.6/site-packages/pretrainedmodels/models/pnasnet.py in cell_forward(self, x_left, x_right) 171 else: 172 x_comb_iter_4_right = x_right --> 173 x_comb_iter_4 = x_comb_iter_4_left + x_comb_iter_4_right 174 175 x_out = torch.cat( RuntimeError: The size of tensor a (9) must match the size of tensor b (10) at non-singleton dimension 3` Is there something I am missing? Thanks!
PPPW commented 5 years ago

Do you have a small image dataset? It seems the model doesn't work with images smaller than 64x64. The original model was designed for images of 299x299, although we added adaptive pooling layer to make it work for images with different sizes, looks like some layers in the body still don't work. If this is the case, maybe you can resize your images and try again?

saurabh502 commented 5 years ago

Hi, I was using 150x150 , but when I changed size to 299x299 I am able to run code without error. But because of this training time has increased. Can we look into how to work with smaller image sizes?

PPPW commented 5 years ago

Are you getting this error if you run for a single image? Something like this: learn.model(torch.randn(1,3,150,150))

Can we look into how to work with smaller image sizes?

Apparently this model implementation wasn't designed for different input image sizes, and the size mismatch issue happens in quite early layers (rather than in the last few layers). So probably it'll not be easy to fix, but we can take a look