amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.
http://keystone-ml.org/
Apache License 2.0
470 stars 117 forks source link

Fix partitioning in Image Dataloaders #155

Open etrain opened 9 years ago

etrain commented 9 years ago

Right now, Image DataLoaders partition on the file names (which works well for ImageNet style collections). While it's possible to repartition manually after images have been loaded, the API indicates that passing a Some(numPartitions) will yield an RDD with that many partitions.

Should be a small change to ImageLoaderUtils.getFilePathsRDD or upstream uses of it.