IntelLabs / SkimCaffe

Caffe for Sparse Convolutional Neural Network
Other
238 stars 64 forks source link

Winograd Layer in Skim Caffe #15

Open imniksha opened 6 years ago

imniksha commented 6 years ago

Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help. Do not post such requests to Issues. Doing so interferes with the development of Caffe.

Please read the guidelines for contributing before submitting this issue.

Issue summary

Hello, I am new to Caffe and deep learning in general, and am hoping to find some answers here :)

I have installed SkimCaffe on my Ubuntu VM and am able to run classification models using Lenet. Now, I want to switch the convolution layer to winograd convolution layer, and perform comparative study between the two types of convolutions.

I have tried to add it as below, however this has not been successful. The winograd layer addition just zeros all entries in the matrix and gives wrong classifications (refer below). I believe I must be doing something wrong here. I would greatly appreciate if someone could guide me to the solution.

Basically, I want to add a Winograd Layer (winograd convolution) into Lenet, using winograd_layer.cpp.

Thank you for the help! Also, please let me know where I could ask this question, if this is not the right platform for it :)

LENET: layer { name: "train-data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto" } data_param { source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db" batch_size: 64 backend: LMDB } } layer { name: "val-data" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto" } data_param { source: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/val_db" batch_size: 32 backend: LMDB } } layer { name: "scale" type: "Power" bottom: "data" top: "scaled" power_param { scale: 0.0125000001863 } } layer { name: "win1" type: "Winograd" bottom: "scaled" top: "win1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 } } layer { name: "pool1" type: "Pooling" bottom: "win1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "win2" type: "Winograd" bottom: "pool1" top: "win2" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 } } layer { name: "pool2" type: "Pooling" bottom: "win2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }

image

jspark1105 commented 6 years ago

Sorry about late reply. You're not doing anything wrong. Training directly in Winograd domain was challenging as described in our paper (https://arxiv.org/pdf/1702.08597.pdf . See Section 6.1 for the comment on 200x smaller learning rate). I also didn't have enough time to put all necessary information to reproduce the results in the paper before I left Intel. Anyway, please try it with much smaller learning rate.

jspark1105 commented 6 years ago

BTW, If I can understand your goal, I may be able to help you better. Do you want to see how much sparsity you can get in Winograd domain in the model of your interest?

imniksha commented 6 years ago

Hi, thank you so much for your reply! Alright, I will try it with a small learning rate and let you know the results.

My goal is as follows: Do a comparative study between direct convolution methods, and winograd convolution in classification problems in CNNs. Yes, I do want to look at how much sparsity winograd convolution gives. Specifically, I want to compare the number of operations, model size, accuracy and time taken to train for the two types of convolution (if this is possible). I want to train the same dataset with both convolutions (while keeping all other parameters the same), and see how the results compare.

Any suggestion to do above is welcome!

I am a student and feel really glad to get a reply and appreciate your help :)

imniksha commented 6 years ago

Just a side note, I see that your paper already has results for this kind of comparative study, but I would like to get some of my own results and learn to study them.

jspark1105 commented 6 years ago

Our work on sparse Winograd was by no means complete, especially on the training side because we needed very low learning rate and so on, so any improvements up on that would be really interesting. BTW, we didn't spend that much time on speedup the training (mostly focused on speedup the inference once you get sparsity in Winograd domain), so training in Winograd domain will be slow especially if you're comparing with cudnn that is extensively optimized.

imniksha commented 6 years ago

I tried to set the learning rate to what you mentioned in your paper (refer below), however I am still unsuccessful in getting results for Winograd convolutions. What else should I change to at least get some results, instead of zeros for all elements after encountering the winograd layer?

image

image

imniksha commented 6 years ago

Hello, could you please tell me how to get data for the winograd layer (above comment)? What parameters should I be setting differently to get some output (will try to optimize later). For right now, I just want to make sure I am able to get some readable output. Thank you!

jspark1105 commented 6 years ago

Sorry about late reply. Can you tell me the exact command you used and share or point to all necessary files needed like /home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db so that I can reproduce?

imniksha commented 6 years ago

Good Morning! Thanks for the reply. I am using DIGITS as the UI to train and build my network. Sorry for the lenet code provided in the first comment, please ignore that. Please use below instead. /home/x/DIGITS/digits/jobs/20180302-235120-dbc4/train_db is something that DIGITS puts into the net when I run the train command (internal files for DIGITS). I am not providing that as input.

Below is the Net that I provide the system:

LENET: name: "LeNet" layer { name: "train-data" type: "Data" top: "data" top: "label" include { stage: "train" } data_param { batch_size: 64 } } layer { name: "val-data" type: "Data" top: "data" top: "label" include { stage: "val" } data_param { batch_size: 32 } } layer { name: "scale" type: "Power" bottom: "data" top: "scaled" power_param { scale: 0.0125000001863 } } layer { name: "win1" type: "Winograd" bottom: "scaled" top: "win1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 } } layer { name: "pool1" type: "Pooling" bottom: "win1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "win2" type: "Winograd" bottom: "pool1" top: "win2" convolution_param { num_output: 20 kernel_size: 5 stride: 1 } } layer { name: "pool2" type: "Pooling" bottom: "win2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { stage: "val" } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" exclude { stage: "deploy" } } layer { name: "softmax" type: "Softmax" bottom: "ip2" top: "softmax" include { stage: "deploy" } }

Below are the Learning Rate Parameters I have tested with: image

image

jspark1105 commented 6 years ago

I'm not familiar with DIGITS and don't have an access to it. Is there a way to reproduce your result without DIGITS?

imniksha commented 6 years ago

Hmm, I will look into another way of reproducing my results and get back to you. DIGITS is a basically a UI set up to train networks with Caffe. DIGITS just provides an easy to use software for new caffe users. In the mean time, do you have any documentation/steps on how you got the Winograd experimental results that you presented in your paper? I would appreciate if you could guide me through them, and this will possibly help me solve me issue. I will try to get results the same way you got, and then try with DIGITs.

jspark1105 commented 6 years ago

I know what DIGITS is. Just I haven't used it before and don't have an access to it. I just need what input data is used for training and validation. I'm sorry that I don't have much documentation on Winograd experiments because I didn't have much time to wrap up before I left Intel and the experiments were (especially the training part) not entirely successful.

imniksha commented 6 years ago

I am using the MNIST Handwriting dataset for training and validation. Below is the link to it: http://yann.lecun.com/exdb/mnist/ Is that what you need? I am sorry if I misunderstood your question.

jspark1105 commented 6 years ago

OK, I'll take a look at this weekend. Sorry about the delay again.

imniksha commented 6 years ago

Sure, thank you!

jspark1105 commented 6 years ago

I was not able to get a good accuracy with your prototxt even if I change back Winograd to Convolution (because I really don't know how to get this -> mean_file: "/home/x/DIGITS/digits/jobs/20180302-235120-dbc4/mean.binaryproto"). So, I just tried the lenet mnist example in the Caffe main branch and just changed Convolution to Winograd (see prototxt below). I'm able to train to 90+% accuracy. Note that I reduced base_lr a lot. I'm sorry that I'm not able to help much and unfortunately I won't have much time to help in the future as well.

lenet_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
#base_lr: 0.01
base_lr: 0.000001
momentum: 0.9
#weight_decay: 0.0005
weight_decay: 0.00005

# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
#snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
#solver_mode: CPU
snapshot_prefix: "examples/mnist/mlp_500_300"

lenet_train_test.prototxt

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Winograd"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Winograd"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
    #decay_mult: 1.0
    #kernel_shape_decay_mult: 0.0
    #breadth_decay_mult: 0.0
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
    #decay_mult: 1.0
    #kernel_shape_decay_mult: 0.0
    #breadth_decay_mult: 0.0
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}
imniksha commented 6 years ago

Hello, Thank you for your time! I used your comment above to try different parameters on my setup, and am able to get results with Winograd! Looks like the issue is solved :)

Appreciate all your help :)

image