dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9k stars 1.88k forks source link

Image Classification Benchmark #2379

Open justinormont opened 5 years ago

justinormont commented 5 years ago

@Anipik: We could make a good benchmark for the image processing pipeline. I'd recommend using the Dog Breeds vs. Fruits dataset which we used in NimbusML for its image examples. We currently host this dataset in our CDN for NimbusML.

In Python, the dataset / image loader looks like:

# Load image summary data from github
url = "https://express-tlcresources.azureedge.net/datasets/DogBreedsVsFruits/DogFruitWiki.SHUF.117KB.735-rows.tsv"
df_train = pd.read_csv(url, sep = "\t", nrows = 100)
df_train['ImagePath_full'] = "https://express-tlcresources.azureedge.net/datasets/DogBreedsVsFruits/" + \
                         df_train['ImagePath']
... load images

Purpose of the dataset is for example code & includes ~775 images of dogs & fruit: image image

(copied from PR -- https://github.com/dotnet/machinelearning/pull/2372#pullrequestreview-199284335)

Anipik commented 5 years ago

yeah it would be nice to have this. I can add a benchmark for this after #2372 gets merged