Open vineel96 opened 1 year ago
Datasets are specified in this config: https://github.com/IntelPython/scikit-learn_bench/blob/master/configs/skl_config.json Data generation/loading functions are defined here: https://github.com/IntelPython/scikit-learn_bench/tree/master/datasets
Hi, Thank you for the links. So, all experiments in figure are done with synthetic datasets generated from sklearn's make_blobs (except for SVC and RF where dataset is mentioned) using this script https://github.com/IntelPython/scikit-learn_bench/blob/master/datasets/make_datasets.py right?
Hi, Thank you for the links. So, all experiments in figure are done with synthetic datasets generated from sklearn's make_blobs (except for SVC and RF where dataset is mentioned) using this script https://github.com/IntelPython/scikit-learn_bench/blob/master/datasets/make_datasets.py right?
Yes, that's right.
Thanks for the information
Hi @Alexsandruss, For the inference,
1-4. If 'testing' field is not provided, than data is same for training and inference. Train and test split is defined in data loaders for named datasets.
Hi @Alexsandruss , 1-4. Generally we use different data for inference and training right? Is it ok to use same training data for inference also? For named datasets, example higgs_one_m for random forest, in the above speedup graph it shows size of data as 1M for both inference graph and training graph. But in loader_classification.py(in datasets folder), it shows different split for train as (1000000, 28) and inference as (500000, 28). So which split is actually used in inference speedup graph? (this is same for all named dataset)
Hello, Can I get the information of datasets used for producing benchmark results(speedup values) for different scikit-learn algorithms as shown in figure under Acceleration sub section at https://github.com/intel/scikit-learn-intelex . Image is also attached here: