dmlc / MXNet.jl

MXNet Julia Package - flexible and efficient deep learning in Julia
371 stars 70 forks source link

Custom output such as exponential function? #167

Open JockLawrie opened 7 years ago

JockLawrie commented 7 years ago

Hi there,

I am trying to define the activation of the last layer as the exponential function. If x is the input vector to a node in the last layer, the output of the node would be exp(w*x + b). Is this possible?

Thanks! Jock

pluskid commented 7 years ago

I think it should be possible with basic operations like matrix multiplication, plus, and exp.

JockLawrie commented 7 years ago

I'm not sure how to get this working. As an example, suppose I have a net with 1 hidden layer of 10 nodes and an output layer of 4 nodes. Suppose that the activation of output layer should be :softrelu. The code below doesn't work, partly because I haven't specified where mx.Variable(:label) comes into the net. What am I missing? And how might I replace softrelu with the exponential function?

Thanks in advance.

net = @mx.chain mx.Variable(:data) =>

    mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
    mx.Activation(name = :fc1_out, act_type = :relu) =>

    mx.FullyConnected(name = :fc2_in, num_hidden = 4) =>
    mx.Activation(name = :fc2_out, act_type = :softrelu)
Arkoniak commented 7 years ago

You can use symbolic calculations to get what you need.

using MXNet

# net without activation layer
net = @mx.chain mx.Variable(:data) =>

    mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
    mx.Activation(name = :fc1_out, act_type = :relu) =>

    mx.FullyConnected(name = :fc2_in, num_hidden = 4)

# Same net with exponential activation
net_out = mx.exp(net, name = :fc2_out)

# Here outputs of two nets are joined together, for easier comparison
net = mx.Group(net, net_out)

println(mx.list_arguments(net))       # arguments = input data + hidden layers weights
println(mx.list_outputs(net))             # outputs, since we use grouped net, we have two outputs: before and after activation

# some random data for forward propagation. Since we are not going to train model, no labels are needed
x = rand(Float32, 10, 2)
data = mx.ArrayDataProvider(:data => x, batch_size=2)

model = mx.FeedForward(net)

# usually you do not use this function directly, it is called internally from train function
mx.init_model(model, mx.UniformInitializer(), data=(10, 2))

# This is forward pass with some random weights. We get two arrays, before and after exponential activation 
res = mx.predict(model, data)

# And we can check, that everything is fine
@assert all(exp(res[1]) .- res[2] .< 1e-6)

But for training model, loss layer is needed as usually, of course.

JockLawrie commented 7 years ago

Thanks that works. I am still having trouble training the model. I tried using MakeLoss together with a custom eval_metric that takes a nonlinear combination of the 4 output values, but it's not working so far. Any ideas?

Arkoniak commented 7 years ago

It's hard to tell without source code and error messages. Can you give a link and tell what exactly is not working?

JockLawrie commented 7 years ago

Sure, code below, together with the resulting error. One obvious problem is that I don't know where mx.Variable(:label) goes. I'm sure there are other issues with this code. Thoughts? Thanks again for your help.

using MXNet

# Custom eval metric
import MXNet.mx: get, reset!, _update_single_output

type CustomMetric <: mx.AbstractEvalMetric
    loss::Float64
    n::Int

    CustomMetric() = new(0.0, 0)
end

function mx.reset!(metric::CustomMetric)
    metric.loss = 0.0
    metric.n = 0
end

function mx.get(metric::CustomMetric)
    [(:CustomMetric, metric.loss / metric.n)]
end

function mx._update_single_output(metric::CustomMetric, label::mx.NDArray, pred::mx.NDArray)
    label = mx.copy(label)
    pred  = mx.copy(pred)
    n = size(label, 1)
    metric.n += n
    for i = 1:n
        z = 0.0
        for j = 1:4
            z += j * pred[j, i]
        end
        loss = sqrt(abs(z - label[i]))
        metric.loss += loss
    end
end

# Base net
net = @mx.chain mx.Variable(:data) =>

      mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
      mx.Activation(name = :fc1_out, act_type = :softrelu) =>

      mx.FullyConnected(name = :fc2_in, num_hidden = 4)

netout = mx.exp(net, name = :fc2_out)

# data
x = rand(Float32, 1, 8)    # 8 observations of 1 variable
y = exp(x) + 2.0 * exp(0.5 * x) + 3.0 * exp(0.3 * x) + 4.0 * exp(0.25  * x)

# Connect net, data and hyperparameters
batch_size = 4
train_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
eval_prov  = mx.ArrayDataProvider(x, y; batch_size = batch_size)

# predictions from model with random parameters
model = mx.FeedForward(netout)
mx.init_model(model, mx.UniformInitializer(), data = (1, 8))
res = mx.predict(model, eval_prov)

# train
netout = mx.MakeLoss(netout)
mdl = mx.FeedForward(netout, context = mx.cpu())
opt = mx.SGD(lr = 0.1, momentum = 0.9, weight_decay = 0.00001)    # Optimizing algorithm
mx.fit(mdl, opt, train_prov, n_epoch = 2, eval_data = eval_prov, eval_metric = CustomMetric())

And the resulting error:

ERROR: MXNet.mx.MXError("[16:46:47] src/symbol/symbol.cc:155: Symbol.InferShapeKeyword argument name softmax_label not found.\nCandidate arguments:\n\t[0]data\n\t[1]fc1_in_weight\n\t[2]fc1_in_bias\n\t[3]fc2_in_weight\n\t[4]fc2_in_bias\n")
 in macro expansion at /home/jock/.julia/v0.5/MXNet/src/base.jl:58 [inlined]
 in _infer_shape(::MXNet.mx.SymbolicNode, ::Array{AbstractString,1}, ::Array{UInt32,1}, ::Array{UInt32,1}) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:276
 in #infer_shape#214(::Array{Any,1}, ::Function, ::MXNet.mx.SymbolicNode) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:319
 in (::MXNet.mx.#kw##infer_shape)(::Array{Any,1}, ::MXNet.mx.#infer_shape, ::MXNet.mx.SymbolicNode) at ./<missing>:0
 in #init_model#931(::Bool, ::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at /home/jock/.julia/v0.5/MXNet/src/model.jl:90
 in (::MXNet.mx.#kw##init_model)(::Array{Any,1}, ::MXNet.mx.#init_model, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at ./<missing>:0
 in _init_model(::MXNet.mx.FeedForward, ::MXNet.mx.ArrayDataProvider, ::MXNet.mx.UniformInitializer, ::Bool) at /home/jock/.julia/v0.5/MXNet/src/model.jl:258
 in #fit#954(::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at /home/jock/.julia/v0.5/MXNet/src/model.jl:355
 in (::MXNet.mx.#kw##fit)(::Array{Any,1}, ::MXNet.mx.#fit, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at ./<missing>:0
Arkoniak commented 7 years ago

There are few things that should be considered.

  1. AbstractEvalMetric is an evaluation metric, i.e. it is not used during backward propagation. It's merely calculated statistics of the current epoch.
  2. What you need is a custom operator with which to calculate loss function. Unfortunately, it is not implemented yet, see for example https://github.com/dmlc/MXNet.jl/issues/166
  3. Error about softmax_label not found is due to the construction of DataProvider. By default labels automaticaly get name softmax_label, if you want to set your own name, you should pass it like this, for example: ArrayDataProvider(:my_data_input_name => x, :my_label_name => y; batch_size=batch_size). You can see it here: https://github.com/dmlc/MXNet.jl/blob/master/src/io.jl#L276
  4. If you add custom loss layer, than output of network equals to the loss value, so you need identity eval metric, to evaluate losses. I do not know, how one can get net output and separately calculate losses in such a case.

Despite the fact, that you are unable add custom loss operator, in this exact task you can do the following trick: summation z += j * pred[j, i] equals to to matrix multiplication of net output with matrix weights equals to [1 2 3 4]. And matrix multiplication is just another FullyConnected symbol, so you can add it, but you have to freeze these weights, so they do not change during backpropagation.

You may see this gist for details of realization: https://gist.github.com/Arkoniak/5402ddf4d272d2c32cc74343d5ce1793, here CustomMetric is just identity used for evaluation and CustomInitializer uniformly initialize weights of net, except last layer, where it set 1:4 as weights.

Yet, may be I am overcomplicate the problem and more simple solution exists :-)

JockLawrie commented 7 years ago

Thanks again, much appreciated. In response to your points above:

  1. If the solver isn't using the eval metric, then what is being optimized?
  2. I'll wait for this feature to be implemented.
  3. OK thanks.
  4. Makes sense, but I'm trying to achieve exactly what you suggest can't be done, namely take the output of the last layer, feed it to a custom loss function and have the solver minimise this. In particular, the labels are only included in the loss function - they do not appear in the net.
Arkoniak commented 7 years ago

Well, for the most part answers are in the gist, from previous comment.

If the solver isn't using the eval metric, then what is being optimized?

Network output is optimized. Main idea is the following: you build network with the following structure

Net with loss output = Input -> Calculations -> Result -> Loss calculation -> Loss output.

All of these are symbolic calculation and may include more than one step of course. For example in your case loss output consists of the following steps: exponential activation, multiplication by weight matrix, subtraction from labels, abs and square root. In gist

netexp = @mx.chain net =>
  mx.exp(name = :fc2_out) =>
  mx.FullyConnected(name = :output1, num_hidden = 1, attrs = Dict(:grad => "freeze"))

netloss = mx.sqrt(mx.abs(netexp - label)) 
netloss = mx.MakeLoss(netloss)

But from the solver point of view it is unimportant, it has whole big network with lots of internal steps.

After training you obtain net that takes your input and produces loss output, i.e. something that is close to zero. On the second step you construct new net:

Output Net = Input -> Calculations -> Result

which is exactly "Net with loss output" without loss calculations. Then you transfer weights from full network, may be deleting excessive weights. In gist it is done as

model_coeff = mx.FeedForward(net)
model_coeff.arg_params = Dict(k => v for (k, v) in mdl.arg_params)
delete!(model_coeff.arg_params, :output1_weight)   # these were used in loss calculation and not needed for predictions.
delete!(model_coeff.arg_params, :output1_bias)
model_coeff.aux_params = mdl.aux_params

Then output of this new net is exactly what you need in the first place.

In case of predefined loss outputs, like SoftmaxOutput, the same things are done, with following differences

  1. From user perspective all steps of loss calculations are combined into one, single symbolic calculation.
  2. It may create label variable internally, so you do not have to declare it explicitly.
  3. Internally for backpropagation loss function is used, but output of this symbol is its input, so you do not need to strip this layer for getting predictions. I.e. effectively you may use same network for training and prediction.
JockLawrie commented 7 years ago

Ah OK. To summarize, my previous understanding was that the eval_metric takes in the network's output and the solver seeks to minimize the eval_metric (loss). Turns out the loss is defined as the output of the network's output layer and the solver minimizes this. During training the eval_metric is calculated at the end of each epoch solely for monitoring purposes. That's most helpful, thanks again.

Arkoniak commented 7 years ago

Yes, exactly. I think it's quite possible to hide all of this stuff from user, so you'll be able proceed exactly as in your initial understanding, as in specifying loss layer and network separately, with construction of full network, stripping away loss results etc etc done internally. I presume it'll be implemented in high level module api.