Open kasiabozek opened 8 years ago
Hi @kasiabozek can you provide more details (e.g. a short snippet that we can run to reproduce this error)?
Here is the code that produces the error:
net = data
interim = mx.SymbolicNode[]
for i in 1:nlayers
conv1 = create_conv(net, nfilters)
conv2 = create_conv(conv1, nfilters)
pool = mx.Pooling(data=conv2, kernel=pool_kernel, stride=pool_stride, pool_type=pool_type)
net = pool
nfilters *= 2
push!(interim, conv2)
end
net = create_conv(net, nfilters)
net = create_conv(net, nfilters)
for i in 1:nlayers
nfilters = div(nfilters, 2);
upsampling = mx.UpSampling(net, scale=2, num_filter=nfilters,
sample_type="bilinear",
workspace=WORKSPACE)
conv1 = create_conv(mx.Concat(interim[end-i+1], upsampling), nfilters)
conv2 = create_conv(conv1, nfilters)
net = conv2
end
As you can see the number of filters that I feed to the upsampling layer is not equal to the number of filters that are in this layers input. If I do the division
nfilters = div(nfilters, 2);
after the upsampling layer in the loop then the network can be trained with no error.
Is that a correct behavior? If it is, I suggest to add a note in the layer description about the required number of filters.
@antinucleon Looking at the code for upsampling, I think the doc here should be updated to say that num_filters
is only used by bilinear upsampling, instead of nearest neighbor upsampling.
Also, the code for bilinear upsampling, this line looks a bit strange. Why the number of group is set to be the number of filters? Is that intended? Or is that the reason causing the issue that @kasiabozek showed above? (The same code is in the cuda version).
@kasiabozek Can you take a look if https://github.com/oist/mxnet/tree/vc/upsampling fixes your problem?
It produces the same behavior. My thinking was why shouldn't num_filter be inferred from the in_data for the upsampling operation? Since it needs to have a matching size. I'm guessing this way upsampling is a bit different in function than deconvolution?
@kasiabozek at least, you could control the upsampling scale in each axis with deconvolution. And you can have larger field of view
with deconvolution using a large kernel.
According to the description linear upsampling is not supposed to depend on the number of filters. However the default value results in a segfault and setting a number not equal to the number of filters of the input produces a cryptic error: