Open JockLawrie opened 7 years ago
I think it should be possible with basic operations like matrix multiplication, plus, and exp.
I'm not sure how to get this working. As an example, suppose I have a net with 1 hidden layer of 10 nodes and an output layer of 4 nodes. Suppose that the activation of output layer should be :softrelu
. The code below doesn't work, partly because I haven't specified where mx.Variable(:label)
comes into the net. What am I missing? And how might I replace softrelu with the exponential function?
Thanks in advance.
net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :relu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4) =>
mx.Activation(name = :fc2_out, act_type = :softrelu)
You can use symbolic calculations to get what you need.
using MXNet
# net without activation layer
net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :relu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4)
# Same net with exponential activation
net_out = mx.exp(net, name = :fc2_out)
# Here outputs of two nets are joined together, for easier comparison
net = mx.Group(net, net_out)
println(mx.list_arguments(net)) # arguments = input data + hidden layers weights
println(mx.list_outputs(net)) # outputs, since we use grouped net, we have two outputs: before and after activation
# some random data for forward propagation. Since we are not going to train model, no labels are needed
x = rand(Float32, 10, 2)
data = mx.ArrayDataProvider(:data => x, batch_size=2)
model = mx.FeedForward(net)
# usually you do not use this function directly, it is called internally from train function
mx.init_model(model, mx.UniformInitializer(), data=(10, 2))
# This is forward pass with some random weights. We get two arrays, before and after exponential activation
res = mx.predict(model, data)
# And we can check, that everything is fine
@assert all(exp(res[1]) .- res[2] .< 1e-6)
But for training model, loss layer is needed as usually, of course.
Thanks that works.
I am still having trouble training the model. I tried using MakeLoss
together with a custom eval_metric
that takes a nonlinear combination of the 4 output values, but it's not working so far. Any ideas?
It's hard to tell without source code and error messages. Can you give a link and tell what exactly is not working?
Sure, code below, together with the resulting error. One obvious problem is that I don't know where mx.Variable(:label)
goes. I'm sure there are other issues with this code. Thoughts? Thanks again for your help.
using MXNet
# Custom eval metric
import MXNet.mx: get, reset!, _update_single_output
type CustomMetric <: mx.AbstractEvalMetric
loss::Float64
n::Int
CustomMetric() = new(0.0, 0)
end
function mx.reset!(metric::CustomMetric)
metric.loss = 0.0
metric.n = 0
end
function mx.get(metric::CustomMetric)
[(:CustomMetric, metric.loss / metric.n)]
end
function mx._update_single_output(metric::CustomMetric, label::mx.NDArray, pred::mx.NDArray)
label = mx.copy(label)
pred = mx.copy(pred)
n = size(label, 1)
metric.n += n
for i = 1:n
z = 0.0
for j = 1:4
z += j * pred[j, i]
end
loss = sqrt(abs(z - label[i]))
metric.loss += loss
end
end
# Base net
net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :softrelu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4)
netout = mx.exp(net, name = :fc2_out)
# data
x = rand(Float32, 1, 8) # 8 observations of 1 variable
y = exp(x) + 2.0 * exp(0.5 * x) + 3.0 * exp(0.3 * x) + 4.0 * exp(0.25 * x)
# Connect net, data and hyperparameters
batch_size = 4
train_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
eval_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
# predictions from model with random parameters
model = mx.FeedForward(netout)
mx.init_model(model, mx.UniformInitializer(), data = (1, 8))
res = mx.predict(model, eval_prov)
# train
netout = mx.MakeLoss(netout)
mdl = mx.FeedForward(netout, context = mx.cpu())
opt = mx.SGD(lr = 0.1, momentum = 0.9, weight_decay = 0.00001) # Optimizing algorithm
mx.fit(mdl, opt, train_prov, n_epoch = 2, eval_data = eval_prov, eval_metric = CustomMetric())
And the resulting error:
ERROR: MXNet.mx.MXError("[16:46:47] src/symbol/symbol.cc:155: Symbol.InferShapeKeyword argument name softmax_label not found.\nCandidate arguments:\n\t[0]data\n\t[1]fc1_in_weight\n\t[2]fc1_in_bias\n\t[3]fc2_in_weight\n\t[4]fc2_in_bias\n")
in macro expansion at /home/jock/.julia/v0.5/MXNet/src/base.jl:58 [inlined]
in _infer_shape(::MXNet.mx.SymbolicNode, ::Array{AbstractString,1}, ::Array{UInt32,1}, ::Array{UInt32,1}) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:276
in #infer_shape#214(::Array{Any,1}, ::Function, ::MXNet.mx.SymbolicNode) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:319
in (::MXNet.mx.#kw##infer_shape)(::Array{Any,1}, ::MXNet.mx.#infer_shape, ::MXNet.mx.SymbolicNode) at ./<missing>:0
in #init_model#931(::Bool, ::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at /home/jock/.julia/v0.5/MXNet/src/model.jl:90
in (::MXNet.mx.#kw##init_model)(::Array{Any,1}, ::MXNet.mx.#init_model, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at ./<missing>:0
in _init_model(::MXNet.mx.FeedForward, ::MXNet.mx.ArrayDataProvider, ::MXNet.mx.UniformInitializer, ::Bool) at /home/jock/.julia/v0.5/MXNet/src/model.jl:258
in #fit#954(::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at /home/jock/.julia/v0.5/MXNet/src/model.jl:355
in (::MXNet.mx.#kw##fit)(::Array{Any,1}, ::MXNet.mx.#fit, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at ./<missing>:0
There are few things that should be considered.
AbstractEvalMetric
is an evaluation metric, i.e. it is not used during backward propagation. It's merely calculated statistics of the current epoch.softmax_label
not found is due to the construction of DataProvider. By default labels automaticaly get name softmax_label
, if you want to set your own name, you should pass it like this, for example: ArrayDataProvider(:my_data_input_name => x, :my_label_name => y; batch_size=batch_size)
. You can see it here: https://github.com/dmlc/MXNet.jl/blob/master/src/io.jl#L276Despite the fact, that you are unable add custom loss operator, in this exact task you can do the following trick: summation z += j * pred[j, i]
equals to to matrix multiplication of net output with matrix weights equals to [1 2 3 4]. And matrix multiplication is just another FullyConnected symbol, so you can add it, but you have to freeze these weights, so they do not change during backpropagation.
You may see this gist for details of realization: https://gist.github.com/Arkoniak/5402ddf4d272d2c32cc74343d5ce1793, here CustomMetric
is just identity used for evaluation and CustomInitializer
uniformly initialize weights of net, except last layer, where it set 1:4 as weights.
Yet, may be I am overcomplicate the problem and more simple solution exists :-)
Thanks again, much appreciated. In response to your points above:
Well, for the most part answers are in the gist, from previous comment.
If the solver isn't using the eval metric, then what is being optimized?
Network output is optimized. Main idea is the following: you build network with the following structure
Net with loss output = Input -> Calculations -> Result -> Loss calculation -> Loss output.
All of these are symbolic calculation and may include more than one step of course. For example in your case loss output consists of the following steps: exponential activation, multiplication by weight matrix, subtraction from labels, abs and square root. In gist
netexp = @mx.chain net =>
mx.exp(name = :fc2_out) =>
mx.FullyConnected(name = :output1, num_hidden = 1, attrs = Dict(:grad => "freeze"))
netloss = mx.sqrt(mx.abs(netexp - label))
netloss = mx.MakeLoss(netloss)
But from the solver point of view it is unimportant, it has whole big network with lots of internal steps.
After training you obtain net that takes your input and produces loss output, i.e. something that is close to zero. On the second step you construct new net:
Output Net = Input -> Calculations -> Result
which is exactly "Net with loss output" without loss calculations. Then you transfer weights from full network, may be deleting excessive weights. In gist it is done as
model_coeff = mx.FeedForward(net)
model_coeff.arg_params = Dict(k => v for (k, v) in mdl.arg_params)
delete!(model_coeff.arg_params, :output1_weight) # these were used in loss calculation and not needed for predictions.
delete!(model_coeff.arg_params, :output1_bias)
model_coeff.aux_params = mdl.aux_params
Then output of this new net is exactly what you need in the first place.
In case of predefined loss outputs, like SoftmaxOutput, the same things are done, with following differences
Ah OK. To summarize, my previous understanding was that the eval_metric
takes in the network's output and the solver seeks to minimize the eval_metric
(loss). Turns out the loss is defined as the output of the network's output layer and the solver minimizes this. During training the eval_metric
is calculated at the end of each epoch solely for monitoring purposes. That's most helpful, thanks again.
Yes, exactly. I think it's quite possible to hide all of this stuff from user, so you'll be able proceed exactly as in your initial understanding, as in specifying loss layer and network separately, with construction of full network, stripping away loss results etc etc done internally. I presume it'll be implemented in high level module api.
Hi there,
I am trying to define the activation of the last layer as the exponential function. If x is the input vector to a node in the last layer, the output of the node would be exp(w*x + b). Is this possible?
Thanks! Jock