deepgram / kur

Descriptive Deep Learning
Apache License 2.0
814 stars 107 forks source link

pytorch: must set `border: valid` for demo to work? what if users want `same`? #71

Closed EmbraceLife closed 7 years ago

EmbraceLife commented 7 years ago

with cifar.yml demo inside kur/, after change the backend to pytorch as below

backend: 
  name: pytorch

I got the following error message:

(dlnd-tf-lab)  ->kur train cifar.yml
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/bin/kur", line 11, in <module>
    load_entry_point('kur', 'console_scripts', 'kur')()
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 491, in main
    sys.exit(args.func(args) or 0)
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 63, in train
    func = spec.get_training_function()
  File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 385, in get_training_function
    model = self.get_model(provider)
  File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 176, in get_model
    self.model.build()
  File "/Users/Natsume/Downloads/kur/kur/model/model.py", line 286, in build
    self.build_graph(input_nodes, output_nodes, network)
  File "/Users/Natsume/Downloads/kur/kur/model/model.py", line 337, in build_graph
    for layer in node.container.build(self):
  File "/Users/Natsume/Downloads/kur/kur/containers/container.py", line 306, in build
    self._built = list(self._build(model))
  File "/Users/Natsume/Downloads/kur/kur/containers/layers/convolution.py", line 223, in _build
    raise ValueError('PyTorch convolutions cannot use "same" '
ValueError: PyTorch convolutions cannot use "same" border mode when the receptive field "size" is even.

After set border from default to valid, it works fine.

My question If user intends to use border = "same", then what shall be done to make pytorch work?

Maybe, we shall change receptive field should not be even, so I tried to set it odd:

  cnn:
    kernels: [64, 32]
    size: [3, 3]
    strides: [1, 1]

But I got a new error this time:

(dlnd-tf-lab)  ->kur train cifar.yml
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/bin/kur", line 11, in <module>
    load_entry_point('kur', 'console_scripts', 'kur')()
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 491, in main
    sys.exit(args.func(args) or 0)
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 64, in train
    func(step=args.step)
  File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 393, in func
    model.restore(initial_weights)
  File "/Users/Natsume/Downloads/kur/kur/model/model.py", line 234, in restore
    self.backend.restore(self, filename)
  File "/Users/Natsume/Downloads/kur/kur/backend/pytorch_backend.py", line 176, in restore
    model.data.model.load_state_dict(state)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/torch/nn/modules/module.py", line 316, in load_state_dict
    own_state[name].copy_(param)
RuntimeError: inconsistent tensor size at /Users/soumith/code/pytorch-builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorCopy.c:51
ajsyp commented 7 years ago

PyTorch only natively supports a border: valid mode. We can emulate the same mode in Kur, of course. It turns out it is trivial to emulate same mode when size has odd sizes (because of the way that PyTorch does its padding), but I haven't given the even sizes too much thought. Until I, or some other contributor, get around to it, PyTorch users should either use valid (with any receptive field) or same with an odd receptive field.

You say you did this, but got an error. The traceback indicates that the problem was encountered in File "/Users/Natsume/Downloads/kur/kur/backend/pytorch_backend.py", line 176, in restore, which almost certainly means you accidentally had old weights sitting on disk which were the wrong size (inconsistent tensor size). Try clearing out (deleting) the old weights and try again.

EmbraceLife commented 7 years ago

Thanks! I shall always remember to clear out the previous folders for weights every time I change the model even a little bit.