dniku commented 9 years ago

Currently DIGITS supports only a single integer label per image. For many applications, like regression or multiclass classification, this is not enough. I would like to propose adding support for both of these features.

There are several problems with this.

The interface for feeding in a new dataset. Currently, DIGITS supports parsing a folder with a bunch of subfolders, each of which contains images from a single class, and also loading from a pregenerated text file. While the former approach cannot be extended to any of the proposed enhancements easily, the latter can, although not quite straightforwardly. Each line in such text file is currently matched against (.+)\s+(\d+)\s*$ (path/to/image 123). This could be replaced with (.+)((?:\s+\d+(?:\.\d*))+)\s*$ to check for a list of ints or floats (path/to/image 123 4. 5.67).
The format for storing a dataset. Currently, that's an LMDB (I think it's always an LMDB and never LevelDB, although the code seems to support both; correct me if I'm wrong) which stores Caffe's Datum structs. The problem is that a Datum has a field for a label, and that's a single int (proof). Currenlty DIGITS dumps the class label into that field. There are at least three solutions to this, but none seem particularly easy:
- Split all databases DIGITS creates in two: one for images and one for labels. Breaks compatibility with previous versions, no reason to do this for single-class classification which most people use.
- Add support for two kinds of databases to DIGITS: the old-style consolidated ones and the new-style split ones. Hard to implement and maintain.
- Patch Caffe, adding something like float_label or int_labels or float_labels to Datum. Increases memory usage (not much) and changes a widely-used structure (very bad).

I am currently working on patching create_db.py to support split databases, although I'm not sure that this is the best approach. Comments would be very much appreciated.

jmozah commented 9 years ago

+1 very needed feature

lukeyeager commented 9 years ago

I'm starting on this now. Do you guys see a need for arbitrary numbers of databases per phase (train/val/test)? I can hardcode it to two databases per phase: images and labels, but I'm thinking people might have more complicated use cases. Maybe they would want some complicated mix of LMDBs including images, coarse_classifications, fine_classifications, segmentations, etc.

Are there any use cases you know of that would make this a worthwhile feature?

dniku commented 9 years ago

Bounding box estimation, for example. 4 integers to encode one box.

lukeyeager commented 9 years ago

Thanks for the response! After reading back through these comments, maybe I should explain how I'm implementing this:

The data database contains unlabeled images (datum.label isn't used).
The labels database contains an N-dimensional label for the image. It can be Kx1 for a classification, or 4x1 for a single bounding box, or HxWxK for a per-pixel segmentation, etc.
Or, you can leave your data unlabeled and do unsupervised learning, in which case you only need the data database.

These are the types of problems which may require more than two databases:

Producing "coarse" and "fine" classifications with the same model (see CIFAR)
Mixing tasks, like classification AND segmentation, with the same model

My question is about whether anyone is using models which perform these more complex types of tasks.

Saneesh commented 9 years ago

Hello Luke, You guys are discussing about the features like "round color black tshirt" or "full sleeve black and white formal shirt"?

How can I train train for this? is it possible now with caffe-DIGITS?

Regards, Saneesh.

lukeyeager commented 9 years ago

After reading through BVLC/caffe#523, BVLC/caffe#1414 and BVLC/caffe#1698, I think this quote best sums up what is possible with Data layers.

Caffe is perfectly happy with models that make matrix outputs and learn from matrix ground truths for problems where the output and truth have spatial dimensions e.g. reconstruction / de-noising, pixelwise semantic segmentation, sliding window detection, and so forth. The forward and backward passes for these models follow directly from the definitions and Caffe has always been capable of computing these. https://github.com/BVLC/caffe/issues/1698#issue-53768814

So I'll revise my previous statement and say that the labels database contains 1-, 2- or 3-dimensional labels. Not "N-dimensional". That's because you have to use Datum in LMDBs, which is restricted to 3 dimensions. It may be easier to work with arbitrary blobs using HDF5Data layers.

It has come to my attention that HDF5DataLayer can now produce multiple blob outputs https://github.com/BVLC/caffe/pull/1414#issuecomment-69285965

For the current iteration, I'm just going to assume that you've created your LMDBs manually and want to use DIGITS for running caffe. In the next iteration, I'll tackle picking a standard input data format so that DIGITS can create your LMDBs or HDF5 files or whatever.

lukeyeager commented 9 years ago

@Saneesh

You guys are discussing about the features like "round color black tshirt" or "full sleeve black and white formal shirt"?

I'm discussing a much more general solution which solves many more problems. It will be a little overkill for the simple multi-labeled example you gave, but will definitely be sufficient.

How can I train train for this? is it possible now with caffe-DIGITS?

It's possible with Caffe, but see BVLC/caffe#1698 (it's hard). I'm currently working on adding it to DIGITS.

Saneesh commented 9 years ago

@lukeyeager Thank you very much! Can we expect this feature in the version DIGIT 3? How long it will take to finish? and How will be informed about to the DIGIT users?

Regards, Saneesh

barbolo commented 9 years ago

Hi, @lukeyeager. How is this issue evolving? Is the work in progress in a specific branch?

lukeyeager commented 9 years ago

I should have something pushed to master within a week or two.

My in-progress branch is at lukeyeager/generic-inference. You can take a look at it if you want but beware - it could do bad things like corrupt your data from previous jobs.

lukeyeager commented 9 years ago

189 has been merged. If you can create your own LMDBs, now you can run your Caffe models in DIGITS. Please see #197 for a discussion about data formats.

cicero19 commented 8 years ago

Has the multiclass classification feature discussed here been implemented? What about the bounding box feature? These are very useful.

NVIDIA / DIGITS

Add support for multiple and/or floating point labels #97

189 has been merged. If you can create your own LMDBs, now you can run your Caffe models in DIGITS. Please see #197 for a discussion about data formats.