apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[R] Transfer Learning using VGG-16 #7968

Closed lichen11 closed 6 years ago

lichen11 commented 7 years ago

I downloaded the pre-trained vgg-16 model from data.dmlc.ml/mxnet/models/imagenet/vgg/. I would like to retrain the last three layers. I followed the tutorial of transfer learning on inception-bn here: https://statist-bhfz.github.io/cats_dogs_finetune.

What I did was:

pretrain_model <- mx.model.load("vgg16", 
                              iteration = 0)

symbol <- pretrain_model$symbol

internals <- symbol$get.internals()
outputs <- internals$outputs

flatten <- internals$get.output(which(outputs == "flatten_0_output")) 

Then I have problem adding the rest of the layers. The rest of the layers in outputs are fc6_weight fc6_bias fc6_output relu6_output drop6_output fc7_weight fc7_bias fc7_output relu7_output drop7_output fc8_weight fc8_bias fc8_output prob_label prob_output

Which mx functions can I use to establish the symbol for the last three layers? Thank you very much!

szha commented 7 years ago

@thirdwing

jeremiedb commented 7 years ago

There are different ways to approach the fine tuning, which depends on whether there are changes to only the weights parameters or also to the model architecture (symbol). General idea is to pick whatever you like from the pre-trained model and weights and decide which weights should be fixed. Note that by default, all weights are retrained.

If what you want is to keep the same VGG model structure but only modify the last one for which the number of hidden units is the number of categories (so two 4096 hidden unist FC layers followed by a FC layer which hidden units match your number of categories), then you would probably what to start the fine-tuning following the last dropout, something like: drop7 <- internals$get.output(which(outputs == "fc7_output"))

Then, you can continue the composition of the symbol like for any model designed from scratch. Here the only two remaining elements as can be seen from the Model Zoo are the last FC and the SoftMaxOutput:

fc_final <- mx.symbol.FullyConnected(data = drop7, num.hidden = you number of levels)
softmax <- mx.symbol.SoftmaxOutput(data = fc_final)

You can now train using mx.model.FeedForward.create and feeding it with softmax as a symbol. mx.model.FeedForward.create has parameters to specify initial weights of the model (arg.params) and whether some parameters must be kept fixed (fixed.param). If you want to have all parameters prior to fc6 to remain fixed, then you will need to pass the vector of these parameters into the fixed.param argument to that their weights are ignored during update. To reuse the VGG-16 weights, you will also need to feed the arg.param argument. This will be the list of weights obtained from the pre-trained model, except for the final FC layer where you would apply a random initialization as illustrated in the cat-dog example.

lichen11 commented 7 years ago

Hi, I believe I have set up the transfer learning correctly, retraining the last fully connected layer. I also made sure the names match the pretrained vgg model. However, R would always crash. When I use on inception-bn or inception-v3, it works fine.

Is there another source to download mxnet vgg weights?

Below is my code for vgg transfer learning. The data is using the data from the cat/dog classification problem https://statist-bhfz.github.io/cats_dogs_finetune.

vgg <- mx.model.load("vgg19", iteration = 0)
symbol <- vgg$symbol
internals <- symbol$get.internals()
outputs <- internals$outputs

drop7 <- internals$get.output(which(outputs == "drop7_output"))
fc_final <- mx.symbol.FullyConnected(data = drop7, num.hidden = 2, name = 'fc8')
new_soft <- mx.symbol.SoftmaxOutput(data = fc_final, name = 'prob')

arg_params_new <- mxnet:::mx.model.init.params(
  symbol = new_soft, 
  input.shape = c(224, 224, 3, 8), 
  output.shape = (8),
  initializer = mxnet:::mx.init.uniform(0.1), 
  ctx = mx.gpu(0)
)$arg.params

fc8_weights_new <- arg_params_new[["fc8_weight"]]
fc8_bias_new <- arg_params_new[["fc8_bias"]]

arg_params_new <- vgg$arg.params
arg_params_new[["fc8_weight"]] <- fc8_weights_new 
arg_params_new[["fc8_bias"]] <- fc8_bias_new 

model <- mx.model.FeedForward.create(
  symbol             = new_soft,
  X                  = train,
  eval.data          = val,
  ctx                = mx.gpu(0),
  eval.metric        = mx.metric.accuracy,
  num.round          = 1,
  learning.rate      = 0.05,
  momentum           = 0.9,
  wd                 = 0.00001,
  kvstore            = "local",
  array.batch.size   = 128,
  epoch.end.callback = mx.callback.save.checkpoint("vgg"), 
  batch.end.callback = mx.callback.log.train.metric(150),
  initializer        = mx.init.Xavier(factor_type = "in", magnitude = 2.34),
  optimizer          = "sgd",
  arg.params         = arg_params_new,
  aux.params         = vgg$aux.params
)
thirdwing commented 7 years ago

Can you try the code in our documents?

https://github.com/apache/incubator-mxnet/blob/master/R-package/vignettes/CatsDogsFinetune.Rmd#load-pretrained-model

thirdwing commented 7 years ago

Can you tell us where you get the VGG model files?

lichen11 commented 7 years ago

I got the files from http://data.dmlc.ml/mxnet/models/imagenet/vgg/.

lichen11 commented 7 years ago

I can run the code in https://github.com/apache/incubator-mxnet/blob/master/R-package/vignettes/CatsDogsFinetune.Rmd#load-pretrained-model. I think it is the same as in https://statist-bhfz.github.io/cats_dogs_finetune.

jeremiedb commented 7 years ago

inpu/output shapes need to be passed as named lists in mx.model.init.params :

arg_params_new <- mx.model.init.params(
  symbol = new_soft, 
  input.shape = list(data = c(224, 224, 3, 8)),
  output.shape = NULL,
  initializer = mxnet:::mx.init.uniform(0.1), 
  ctx = mx.cpu()
)$arg.params
lichen11 commented 7 years ago

I changed the line for input.shape to

  input.shape = list(data = c(224, 224, 3, 8)),

then I am getting the error

Error in symbol$infer.shape(list(...)) : 
  Not compatible with requested type: [type=list; target=integer].

I also tried

input.shape = list("data" = c(224, 224, 3, 8)),

still getting the same error. I didn't encounter error using

input.shape =  c(224, 224, 3, 8),

I am using mxnet_0.10.1.

jeremiedb commented 7 years ago

Windows or Linux? I ran the code from the pre-compiled library for Windows, version 0.10.1:

cran <- getOption("repos")
cran["dmlc"] <- "https://s3-us-west-2.amazonaws.com/apache-mxnet/R/CRAN/"
options(repos = cran)
install.packages("mxnet")
jeremiedb commented 7 years ago

Did you also switched output.shape to NULL?

lichen11 commented 6 years ago

I am running Linux: R version 3.3.3 (2017-03-06) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Yes I switched to output.shape = NULL. Does it train fine on your machine when you apply transfer learning on VGG?

lichen11 commented 6 years ago

Hi, I recently attempted transfer learning on ResNet101. I only retrain the last fully connected layer.

resnet101<- mx.model.load("Model/ResNet/resnet-101", iteration=0)
symbol<- resnet101$symbol
internals<- symbol$get.internals()
outputs<- internals$outputs
flatten<- internals$get.output(which(outputs=="flatten0_output"))
new_fc<- mx.symbol.FullyConnected(data=flatten, num_hidden=2, name="fc1") 
new_soft <- mx.symbol.SoftmaxOutput(data=new_fc, name='softmax')
arg_params_new<- mxnet:::mx.model.init.params(
    symbol = new_soft, 
    input.shape = list(data = c(224,224,3,32)), 
    output.shape = NULL,
    initializer = mxnet:::mx.init.uniform(0.1), 
    ctx =mx.gpu(0) 
    )$arg.params

fc1_weights_new<- arg_params_new[["fc1_weight"]]
fc1_bias_new<- arg_params_new[["fc1_bias"]]
arg_params_new <- resnet101$arg.params
arg_params_new[["fc1_weight"]] <- fc1_weights_new 
arg_params_new[["fc1_bias"]] <- fc1_bias_new 

However, when I initiate training, my R would crash. It first gives the following msg:

Start training with 1 devices
[19:35:16] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the         best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)

Then it crashes. I searched online to set MXNET_CUDNN_AUTOTUNE to 0 to disable and updated my mxnet to 1.0.0., since some say this version will resolve the MXNET_CUDNN_AUTOTUNE issue. However, after updating, my R is still crashing when using ResNet or VGG. Meanwhile, transfer learning using Inception does not crash. I am wondering if there is an internal bug in mxnet R to cause this issue.

jeremiedb commented 6 years ago

Just made a few tests with different ResNet models and I also experienced crashes.

Issue appears tied with a memory that isn't released during training. No problem with ResNet34 or 50, but it got problematic with 101. Have you looked at the GPU usage immediatly after launching the training (nvidia-smi) to confirm you have same issue?

I also noticed apparent memory leak when running large embeddings. A quick turnaround is to add a gc() within the training loop after each couple of batch (not necessary to add a gc() within the eval data loop). You can do it either in mx.model.FeedForward.create or mx.model.buckets (I only used the later but should work for the usual training function). Good news is that it doesn't slow down noticeably the training and finetune ResNet101 wasn't crashing anymore and GPU memory remained below 4Go on 8 samples.

@thirdwing Any idea whether this memory issue could better be handled than with gc()? If performance isn't affected, I wonder if a quick PR with the gc() would be worth.

lichen11 commented 6 years ago

I added gc() before mx.model.FeedForward.create and also within mx.model.FeedForward.create. Then I immediately ran nvidia-smi. At some second, utilization was 29%, then GPU Memory Usage reached to about 12.7GB for a few seconds, then R crashes. I even set the batch size to 5 to see if the problem is solved, but it still crashes.

jeremiedb commented 6 years ago

Could you provide the details on how you add gc() within mx.model.FeedForward.create? The gc() modification did solved it the memory consumption on my end.

lichen11 commented 6 years ago

I added gc() quite a few times before training and three times within training. Please see below:

gc()
model <- mx.model.FeedForward.create(
  gc(),
  symbol             = new_soft,
  X                  = train,
  eval.data          = val,
  ctx                = mx.gpu(0),
  eval.metric        = mx.metric.accuracy,
  num.round          = 1,
  learning.rate      = 0.05,
  momentum           = 0.9,
  wd                 = 0.00001,
  kvstore            = "local",
  array.batch.size   = 5,
  gc(),
  epoch.end.callback = mx.callback.save.checkpoint("resnet"), #save the params value.
  batch.end.callback = mx.callback.log.train.metric(150),
  initializer        = mx.init.Xavier(rnd_type='gaussian', factor_type = "in", magnitude = 2),
  optimizer          = "sgd",
  arg.params         = arg_params_new,
  allow.extra.params = T,
  aux.params         = resnet101$aux.params,
  gc()

)

jeremiedb commented 6 years ago

Adding the gc() in the function arguments won't have effect during training time. gc() needs to be added within the function definition, ie:

x = function(a, b) {
  y = a*b
  gc()
return(y)
}

Calling x(a=1, b=2, gc()) just won't work since you're passing function arguments. You would need to add the gc() here:: https://github.com/apache/incubator-mxnet/blob/master/R-package/R/model.R#L221 To be effective, you'll need to source these functions, and note that some of the dependent functions are not exported in the package, requiring to call them through mxnet:::.

As a quick alternative, you can train using the following function that should work straight: https://github.com/jeremiedb/mxnet_R_bucketing/blob/master/model.rnn.R#L127 Add gc() at line 127, then can train the model with mx.model.buckets.

ankkhedia commented 6 years ago

@jeremiedb Vgg19 transfer learning failed for me and the GPU memory footprint at time of failure was around 15GB while running on Windows. Could you please elaborate on the solution using gc() which you proposed. Also, are you sure if it is not a hardware constraint?

@lichen11 Were you able to get past this issue?

jeremiedb commented 6 years ago

@ankkhedia I haven't used specifically VGG19, but I just made a run with Inception-BN and it was able to run with around 3 GB on my laptop GPU (1060) with batch size of 32 on 224X224 images.

Which version did you specifically used? Those exploding RAM explosion should have been fixed with the optimizers refactor. I can try on VGG19 a bit later, but given Inception-BN consumption, I think you might be using mxnet prior to optimizers fix.

ankkhedia commented 6 years ago

@jeremiedb Inception_BN did work for me but Inception BN is very small model compared to VGG19. I have tried latest MXNetR GPU build for Windows(cu90)

jeremiedb commented 6 years ago

Right, vgg16 and 19 crashed on even with very small batch sizes. However, process works fine also with Resnet 34, 50 and 101 (on a 6Gb 1060). Do you know what to expect in term of memory consumption for vgg? Given many other models work fine, not sure if it's a R issue?

ankkhedia commented 6 years ago

@jeremiedb Vgg16 would require more than 12GB of GPU memory size and vgg19 requires more than 15GB with the official MxnetR distribution. This numbers are the GPU memory footprint at the time of crash. However, I tried the gc() fix which you mentioned above and the transfer learning do work fine with GPU memory footprint constant at 6GB at batch size of 150. I think it makes sense to add the gc() fix in model.R to avoid these crashes. Do you have any better suggestions?

jeremiedb commented 6 years ago

@ankkhedia I'm still having some difficulty to wrap my head around this. Memory footprint remains constant on my run even on large model (resnet-50 / 101).

Footprint of 6GB on vgg19 and batch size of 150 appears quite low. Have you fixed the parameters of all but the last layer? I can fine-tune with vgg19 with parameters fixed and batch size of 8. Memory rise at 6GB at model inception then drops to 3GB and remains constant, so I have cannot reproduce how gc() in the training loop would help.

I don't have a better solution for now, but given than adding a gc in impairs training speed I would be reluctant to add it at this point given the it seems very specific to vgg.

ankkhedia commented 6 years ago

Hi @jeremiedb It makes sense. Finetuning on larger networks can be achieved using smaller batch size. It worked for me with batch size of 32 for vgg19 and without using gc(). Since adding gc() might affect training performance and slow it down, it makes sense to reduce batch sizer according to the hardware in case the training crashes.

@lichen11 Please try with a reduced batch size and it should fix your issue.

@sandeep-krishnamurthy Could you please close the issue as it has been answered and solved. @lichen11 Please feel free to reopen if closed in error.