Closed sguada closed 9 years ago
@Yangqing when you do the data_layers re-design in #407 #244 keep this in mind.
Regarding that paper, I believe they will be releasing their source code (and models) soon
The paper that #548 wants to implement [1] proposed a very natural and general way to extract convolutional features for images of any sizes and then pool the feature maps into fixed length vectors with spatial pyramids. The spatial pyramid pooling(SPP) idea is not new. But until now, most people are only used to doing pooling with sliding windows in CNN. On the other hand, the SPP-net only experimented with max pooling in each spatial bin while sliding windows also used other aggregation methods.
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014
This is complementary to #505.
@sguada, do you think #355 is a prerequisite for this issue?
The idea is allow images of different sizes as inputs, but then keep them with fix size after cropping, so the rest of the network will work as usual.
Therefore for now #355 is not needed, although could be combined later on.
Got your idea. The ImageDataLayer resizes the images before cropping and mirroring them. The convert_imageset tool ensures that the images stored in the Leveldb are of the same size. So there is basically no requirement of the original images sizes. Only LMDB needs to be enhanced.
The mean_value is just a simplification of the mean_file and don't have to replace the latter.
I use the ImageDataLayer and I don't understand what you said "replace mean_file with a mean_value. ". How to compute the mean_value? Does the Caffe have tool to compute the mean_file using the input images? Before I write the tool to compute the mean_file, I want to confirm. Thanks!
@Dcocoa it turns out the spatial mean i.e. the mean over images with dimensions K x H x W is almost everywhere the same in height and width so averaging over spatial dimensions in to a channel mean with dimensions K x 1 x 1 achieves virtually the same network performance while making preprocessing simpler and more flexible.
compute_image_mean
computes the mean.
Please, I want to use SPP-Net and it works well. However, when I change the number of layers it givers error !! so do I need to compile caffe ? or just use the given caffe.mex ?
Closing as we now have per-channel mean so this should work, and should be doable with gradient accumulation. (If it's broken for batch size > 1, you're welcome to open a new issue for that.)
Bases in recent experiments, cropping from images with smallest size = 256 perform better. http://arxiv.org/pdf/1405.3531v2.pdf
The idea is allowing that images have different sizes before cropping, but became same size after cropping. This would require remove the mean_file and replace it with a mean_value.
LevelDB, LMDB and Image_Data_Layer should not assume that the images have the same size.