Allow users to specify additional data to be returned with feature vectors. (i.e. field values contained in each original document)
For LIBSVM (labeled data point) format, labels (integer value) are needed. Allow users to describe Lucene field values => labels mapping or generate implicitly if the mapping is not given. When generate the mapping implicitly, output it to a file for reuse.
http://spark.apache.org/docs/1.2.1/mllib-data-types.html
Currently, "vector" format is supported. Need support for other data types.