chbrown / slda

Supervised Latent Dirichlet Allocation for Classification
GNU General Public License v2.0
85 stars 25 forks source link

Supervised Latent Dirichlet Allocation for Classification

This is a C++ implementation of supervised latent Dirichlet allocation (sLDA) for classification.

Note that this code depends on the GNU Scientific Library.

Compiling

git clone https://github.com/chbrown/slda
cd slda
make

You may need to install the gsl first. E.g., on a Mac:

brew install gsl

Estimation

Estimate the model by executing:

slda est <data path> <label path> <settings path> <alpha> <k> <initialization> <output directory>

Example usage:

./slda est test/images/train-data.dat test/images/train-label.dat \
    settings.txt 1.0 10 random tmp/

Inference

To perform inference on a different set of data (in the same format as for estimation), execute:

slda inf <data path> <label path> <settings path> <model path> <output directory>

Example usage:

./slda inf test/images/test-data.dat test/images/test-label.dat \
    settings.txt tmp/final.model tmp/

This will also produce a final line of output, evaluating against the labels specified in the <label path> argument:

average accuracy: 0.679

Sample data

The sample data in test/images was downloaded from http://www.cs.cmu.edu/~chongw/data/images.tgz on July 12, 2013.

Description of data from original site:

A preprocessed 8-class image dataset from Labelme.

UIUC Sports annotation files: annotations and meta information. The source image files can be found here. (Note: there might be some discrepancies and I don't seem to know why...)

License

Copyright © 2009, Chong Wang, David Blei and Li Fei-Fei

Licensed under both the GPL v2 and GPL v3, as well as any future version of the GNU General Public License.