How to train a model with matrix (instead of scalar) ground truth?

BVLC / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

34.04k stars 18.7k forks source link

How to train a model with matrix (instead of scalar) ground truth? #1241

Closed llyydd007 closed 9 years ago

llyydd007 commented 9 years ago

I want to reimplement the great work in [1]. They use image pairs to train a deep network for image superresolution task.The problem is that the labels became a matrix rather than a number as in the classification tasks. I am not skilled at C++ programming. And I don't now how to do the modification. As far as I can see, I think I should change the data structure, to modify the data type for the label, ReadImageToDatum for data and labels twice, and it is totally messed. Could somebody give me an outline about how can I make the modification.Thanks very much ! [1]Dong C, Loy C C, He K, et al. Learning a deep convolutional network for image super-resolution[M]//Computer Vision–ECCV 2014. Springer International Publishing, 2014: 184-199.

ssafar commented 9 years ago

Here is an idea: add the labels (it's another image, right?) as another data source instead, and ignore the original, integer "label" field for both of your data layers (if you put a SILENCE layer on the top of them, they won't be printed all the time). You build your neural net on the top of the low-res input (the first output of the first data layer), and when you've got the restored image, you compare it with the high-res image (coming from the first output of the second data layer) using an EUCLIDEAN_LOSS. (This still doesn't solve the question of how to write out the resulting images to files on disk, for that one, you could use the Python wrapper once you have the trained net.)

llyydd007 commented 9 years ago

I really appreciate your idea and I will try that,thanks a lot.

shelhamer commented 9 years ago

Defining a model with two data layers is the right approach for these types of problems with rich i.e. non-scalar ground truth.

if you put a SILENCE layer on the top of them, they won't be printed all the time)

No need for that -- just do not define the second label top of the data layer. It will be configured to only yield data and no label in this case.

Please discuss further on the caffe-users. Issues are for development discussion. Thanks!