Allow images of different sizes as inputs

BVLC / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

34.14k stars 18.68k forks source link

Allow images of different sizes as inputs #557

Closed sguada closed 9 years ago

sguada commented 10 years ago

Bases in recent experiments, cropping from images with smallest size = 256 perform better. http://arxiv.org/pdf/1405.3531v2.pdf

The idea is allowing that images have different sizes before cropping, but became same size after cropping. This would require remove the mean_file and replace it with a mean_value.

LevelDB, LMDB and Image_Data_Layer should not assume that the images have the same size.

sguada commented 10 years ago

@Yangqing when you do the data_layers re-design in #407 #244 keep this in mind.

jamt9000 commented 10 years ago

Regarding that paper, I believe they will be releasing their source code (and models) soon

http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

kloudkl commented 10 years ago

The paper that #548 wants to implement [1] proposed a very natural and general way to extract convolutional features for images of any sizes and then pool the feature maps into fixed length vectors with spatial pyramids. The spatial pyramid pooling(SPP) idea is not new. But until now, most people are only used to doing pooling with sliding windows in CNN. On the other hand, the SPP-net only experimented with max pooling in each spatial bin while sliding windows also used other aggregation methods.

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The 13th European Conference on Computer Vision (ECCV), 2014

kloudkl commented 10 years ago

This is complementary to #505.

kloudkl commented 10 years ago

@sguada, do you think #355 is a prerequisite for this issue?

sguada commented 10 years ago

The idea is allow images of different sizes as inputs, but then keep them with fix size after cropping, so the rest of the network will work as usual.

Therefore for now #355 is not needed, although could be combined later on.

kloudkl commented 10 years ago

Got your idea. The ImageDataLayer resizes the images before cropping and mirroring them. The convert_imageset tool ensures that the images stored in the Leveldb are of the same size. So there is basically no requirement of the original images sizes. Only LMDB needs to be enhanced.

The mean_value is just a simplification of the mean_file and don't have to replace the latter.

qingqing01 commented 10 years ago

I use the ImageDataLayer and I don't understand what you said "replace mean_file with a mean_value. ". How to compute the mean_value? Does the Caffe have tool to compute the mean_file using the input images? Before I write the tool to compute the mean_file, I want to confirm. Thanks!

shelhamer commented 10 years ago

@Dcocoa it turns out the spatial mean i.e. the mean over images with dimensions K x H x W is almost everywhere the same in height and width so averaging over spatial dimensions in to a channel mean with dimensions K x 1 x 1 achieves virtually the same network performance while making preprocessing simpler and more flexible.

compute_image_mean computes the mean.

hayderm commented 9 years ago

Please, I want to use SPP-Net and it works well. However, when I change the number of layers it givers error !! so do I need to compile caffe ? or just use the given caffe.mex ?

longjon commented 9 years ago

Closing as we now have per-channel mean so this should work, and should be doable with gradient accumulation. (If it's broken for batch size > 1, you're welcome to open a new issue for that.)