deepgram / kur

Descriptive Deep Learning
Apache License 2.0
817 stars 107 forks source link

Error only occurs in theano backend when `kur train cifar.yml` #78

Open EmbraceLife opened 7 years ago

EmbraceLife commented 7 years ago

When I added some codes into my own kur, but it runs fine on deepgram/kur/examples/cifar.yml, when it runs on my cifar.yml below:

---

settings:

  # Where to get the data
  cifar: &cifar
    url: "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    checksum: "6d958be074577803d12ecdefd02955f39262c83c16fe9348329d7fe0b5c001ce"
    path: "/Users/Natsume/Downloads/data_for_all/cifar"

  # Backend to use
  backend:
    name: keras
    backend: tensorflow
    # name: pytorch
    # there is a problem with receptive fields size being even and 'same' border for pytorch convolution

  # Hyperparameters
  cnn:
    kernels: [64, 32]
    size: [2, 2]
    strides: [1, 1]

# The model itself.
# This is parsed immediately after the "parameters" block.
model:
  - input: images
    sink: yes # sink somehow make this input layer accessible as an layout output
  - convolution:
      kernels: 64
      size: [2,2]
      strides: [1,1]
      border: valid
  - activation: #leakyrelu
      type: leakyrelu # leakyrelu # relu
      alpha: 0.01 # if alpha not exist or empty as None, default value is 0.3
    # make this activation layer accessible
    sink: yes
    name: conv_layer1
  - convolution:
      kernels: 32
      size: [2,2]
      strides: [1,1]
      border: valid
  - activation: #leakyrelu
      # interest hierarchy, go trace activation object
  # check container.parse(), _parse_core() for details of set up `sink`, `name`
      type: leakyrelu # leakyrelu # relu
      alpha: 0.01 # if alpha not exist or empty as None, default value is 0.3
    sink: yes
    name: conv_layer2
  - flatten:
  - dense: 10
    sink: yes
    name: dense1
  - activation: softmax
    #   name: softmax
    name: labels # this is output rather than labels of inputs???

train:
  data:
    - cifar:
        <<: *cifar
        parts: [1, 2, 3, 4]
  provider:
    batch_size: 32
    num_batches: 1
  log: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar-log
  epochs:
    number: 2
    mode: additional
  stop_when:
    epochs: 1 # null or infinite : to train forever
    elapsed:
      minutes: 10
      hours: 0
      days: 0
      clock: all # (time spend on all things) or all | train | validate | batch
    mode: additional # additional | total, if set total, then elapsed above define total training time in history added

  hooks:
    - plot_weights:
        # plot and save the layers
        layer_names: [images, conv_layer1, conv_layer2, dense1] # work on both so far
        plot_every_n_epochs: 1
        plot_directory: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar_plot_weights
        weight_file: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.best.valid.w
        with_weights:
          - ["convolution", "kernel"]
          - ["convolution", "weight"]
          - ["dense", "kernel"]
          - ["dense", "weight"]
        # animate only specified layers and weights
        # sometimes, convolution_0, or convolution.0
        animate_layers: [images, convolution.0, conv_layer1, convolution.1, conv_layer2, dense1]
    - plot: # the folder must be prepared first
        loss_per_batch: /Users/Natsume/Downloads/temp_folders/demo_cifar/plot1.png
        loss_per_time: /Users/Natsume/Downloads/temp_folders/demo_cifar/plot2.png
        throughput_per_time: /Users/Natsume/Downloads/temp_folders/demo_cifar/plot3.png
  weights: # the folders below are prepared automatically?
    initial: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.best.valid.w
    best: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.best.train.w
    last: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.last.w
  checkpoint:
    path: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar-checkpoint
    batches: 500 # batches, samples, epochs, minutes if present, must be an integer, not a string, not null, not None
    samples: 1000
    epochs: 1
    minutes: 1000
    validation: no
  optimizer:
    name: adam
    learning_rate: 0.001

validate:
  data:
    - cifar:
       <<: *cifar
       parts: 5
  provider:
    num_batches: 1
  weights: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.best.valid.w
  hooks:
    - output: # folder and file must be prepared first
        path: /Users/Natsume/Downloads/temp_folders/demo_cifar/output.pkl
        format: pickle

test: &test
  data:
    - cifar:
       <<: *cifar
       parts: test
  weights: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.best.valid.w
  provider:
    num_batches: 10

evaluate:
  <<: *test
  destination: /Users/Natsume/Downloads/temp_folders/demo_cifar/cifar.results.pkl

loss:
  - target: labels
    name: categorical_crossentropy
...

I got the following error only occur in keras theono backend. Could you give me some hints on how to solve it? thanks!

[ERROR 2017-05-17 00:00:28,337 kur.model.executor:295] Exception raised during training.
Traceback (most recent call last):
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 292, in train
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 729, in wrapped_train
    self.compile('train', with_provider=provider)
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 116, in compile
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/backend/keras_backend.py", line 654, in compile
    compiled.trainable_weights, total_loss
  File "/Users/Natsume/Downloads/kur/kur/optimizer/optimizer.py", line 47, in optimize
    return keras_optimizer.get_updates(weights, [], loss)
  File "/Users/Natsume/Downloads/keras/keras/optimizers.py", line 381, in get_updates
    grads = self.get_gradients(loss, params)
  File "/Users/Natsume/Downloads/keras/keras/optimizers.py", line 47, in get_gradients
    grads = K.gradients(loss, params)
  File "/Users/Natsume/Downloads/keras/keras/backend/theano_backend.py", line 1180, in gradients
    return T.grad(loss, variables)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 561, in grad
    grad_dict, wrt, cost_name)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1324, in _populate_grad_dict
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1324, in <listcomp>
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1113, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/tensor/nnet/abstract_conv.py", line 828, in grad
    d_bottom = bottom.type.filter_variable(d_bottom)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/tensor/type.py", line 233, in filter_variable
    self=self))
TypeError: Cannot convert Type TensorType(float32, 4D) (of Variable AbstractConv2d_gradInputs{border_mode='valid', subsample=(1, 1), filter_flip=True, imshp=(None, 64, 31, 31), kshp=(32, 64, 2, 2)}.0) into Type TensorType(float64, 4D). You can try to manually convert AbstractConv2d_gradInputs{border_mode='valid', subsample=(1, 1), filter_flip=True, imshp=(None, 64, 31, 31), kshp=(32, 64, 2, 2)}.0 into a TensorType(float64, 4D).
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/bin/kur", line 11, in <module>
    load_entry_point('kur', 'console_scripts', 'kur')()
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 612, in main
    sys.exit(args.func(args) or 0)
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 75, in train
    func(step=args.step)
  File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 432, in func
    return trainer.train(**defaults)
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 292, in train
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 729, in wrapped_train
    self.compile('train', with_provider=provider)
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 116, in compile
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/backend/keras_backend.py", line 654, in compile
    compiled.trainable_weights, total_loss
  File "/Users/Natsume/Downloads/kur/kur/optimizer/optimizer.py", line 47, in optimize
    return keras_optimizer.get_updates(weights, [], loss)
  File "/Users/Natsume/Downloads/keras/keras/optimizers.py", line 381, in get_updates
    grads = self.get_gradients(loss, params)
  File "/Users/Natsume/Downloads/keras/keras/optimizers.py", line 47, in get_gradients
    grads = K.gradients(loss, params)
  File "/Users/Natsume/Downloads/keras/keras/backend/theano_backend.py", line 1180, in gradients
    return T.grad(loss, variables)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 561, in grad
    grad_dict, wrt, cost_name)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1324, in _populate_grad_dict
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1324, in <listcomp>
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 973, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1279, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/gradient.py", line 1113, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/tensor/nnet/abstract_conv.py", line 828, in grad
    d_bottom = bottom.type.filter_variable(d_bottom)
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/theano/tensor/type.py", line 233, in filter_variable
    self=self))
TypeError: Cannot convert Type TensorType(float32, 4D) (of Variable AbstractConv2d_gradInputs{border_mode='valid', subsample=(1, 1), filter_flip=True, imshp=(None, 64, 31, 31), kshp=(32, 64, 2, 2)}.0) into Type TensorType(float64, 4D). You can try to manually convert AbstractConv2d_gradInputs{border_mode='valid', subsample=(1, 1), filter_flip=True, imshp=(None, 64, 31, 31), kshp=(32, 64, 2, 2)}.0 into a TensorType(float64, 4D).
ajsyp commented 7 years ago

I cannot reproduce this on my environment. I basically took your Kurfile, removed the plot_weights hook and leaky ReLU references, set the backend to Theano, and tried to run it. Worked fine. Maybe something changed in Theano? Can you do a pip freeze?