byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

Find out which metafeatures OpenML computes which we do not #89

Open bjschoenfeld opened 6 years ago

emrysshevek commented 6 years ago

It seems like OpenML has a lot of possible metafeatures they can compute, but as far as I can tell these are the only ones they actually use: "AutoCorrelation", "CardinalityAtFour", "CardinalityAtThree", "CardinalityAtTwo", "CfsSubsetEval_DecisionStumpAUC", "CfsSubsetEval_DecisionStumpErrRate", "CfsSubsetEval_DecisionStumpKappa", "CfsSubsetEval_NaiveBayesAUC", "CfsSubsetEval_NaiveBayesErrRate", "CfsSubsetEval_NaiveBayesKappa", "CfsSubsetEval_kNN1NAUC", "CfsSubsetEval_kNN1NErrRate", "CfsSubsetEval_kNN1NKappa", "DecisionStumpAUC", "DefaultAccuracy", "J48.00001.AUC", "J48.00001.ErrRate", "J48.00001.Kappa", "J48.0001.AUC", "J48.0001.ErrRate", "J48.0001.Kappa", "J48.001.AUC", "J48.001.ErrRate", "J48.001.Kappa", "MaxNominalAttDistinctValues", "MeanNominalAttDistinctValues", "MinNominalAttDistinctValues", "NaiveBayesAUC", "NumberOfBinaryFeatures", "PercentageOfBinaryFeatures", "REPTreeDepth1AUC", "REPTreeDepth1ErrRate", "REPTreeDepth1Kappa", "REPTreeDepth2AUC", "REPTreeDepth2ErrRate", "REPTreeDepth2Kappa", "REPTreeDepth3AUC", "REPTreeDepth3ErrRate", "REPTreeDepth3Kappa", "RandomTreeDepth1AUC", "RandomTreeDepth2AUC", "RandomTreeDepth3AUC", "StdvNominalAttDistinctValues", "kNN1NAUC",

joaquinvanschoren commented 6 years ago

Here's a list of all OpenML meta-features and the number of datasets for which we computed them. An important factor is whether the datasets is a classification or regression task.

Most meta-features are computed for all 1184 classification datasets. Sometimes a landmarker is missing because it took way too long to compute on that dataset (we have datasets with millions of instances and 100000s of features). There are also a few that we don't compute ourselves, but that are computed and submitted by other users, such as the cardinality-based meta-features.

"AutoCorrelation","1468" "CardinalityAtFour","161" "CardinalityAtThree","161" "CardinalityAtTwo","161" "CfsSubsetEval_DecisionStumpAUC","1176" "CfsSubsetEval_DecisionStumpErrRate","1176" "CfsSubsetEval_DecisionStumpKappa","1176" "CfsSubsetEval_NaiveBayesAUC","1176" "CfsSubsetEval_NaiveBayesErrRate","1176" "CfsSubsetEval_NaiveBayesKappa","1176" "CfsSubsetEval_kNN1NAUC","1176" "CfsSubsetEval_kNN1NErrRate","1176" "CfsSubsetEval_kNN1NKappa","1176" "ClassEntropy","1184" "DecisionStumpAUC","1180" "DecisionStumpErrRate","1180" "DecisionStumpKappa","1180" "DefaultAccuracy","19489" "Dimensionality","19906" "EquivalentNumberOfAtts","1184" "J48.00001.AUC","1180" "J48.00001.ErrRate","1180" "J48.00001.Kappa","1180" "J48.0001.AUC","1180" "J48.0001.ErrRate","1180" "J48.0001.Kappa","1180" "J48.001.AUC","1180" "J48.001.ErrRate","1180" "J48.001.Kappa","1180" "MajorityClassPercentage","19906" "MajorityClassSize","19906" "MaxAttributeEntropy","1184" "MaxCardinalityOfNominalAttributes","161" "MaxCardinalityOfNumericAttributes","161" "MaxKurtosisOfNumericAtts","1184" "MaxMeansOfNumericAtts","1184" "MaxMutualInformation","1184" "MaxNominalAttDistinctValues","1184" "MaxSkewnessOfNumericAtts","1184" "MaxStdDevOfNumericAtts","1184" "MeanAttributeEntropy","1184" "MeanCardinalityOfNominalAttributes","161" "MeanCardinalityOfNumericAttributes","161" "MeanKurtosisOfNumericAtts","1184" "MeanMeansOfNumericAtts","1184" "MeanMutualInformation","1184" "MeanNoiseToSignalRatio","1184" "MeanNominalAttDistinctValues","1184" "MeanSkewnessOfNumericAtts","1184" "MeanStdDevOfNumericAtts","1184" "MinAttributeEntropy","1184" "MinCardinalityOfNominalAttributes","161" "MinCardinalityOfNumericAttributes","161" "MinKurtosisOfNumericAtts","1184" "MinMeansOfNumericAtts","1184" "MinMutualInformation","1184" "MinNominalAttDistinctValues","1184" "MinSkewnessOfNumericAtts","1184" "MinStdDevOfNumericAtts","1184" "MinorityClassPercentage","1463" "MinorityClassSize","19906" "NaiveBayesAUC","1180" "NaiveBayesErrRate","1180" "NaiveBayesKappa","1180" "NumberOfBinaryFeatures","19906" "NumberOfClasses","19906" "NumberOfFeatures","19906" "NumberOfInstances","19906" "NumberOfInstancesWithMissingValues","19906" "NumberOfMissingValues","19906" "NumberOfNumericFeatures","19906" "NumberOfSymbolicFeatures","19906" "PercentageOfBinaryFeatures","19906" "PercentageOfInstancesWithMissingValues","19906" "PercentageOfMissingValues","19906" "PercentageOfNumericFeatures","19906" "PercentageOfSymbolicFeatures","19906" "Quartile1AttributeEntropy","1184" "Quartile1KurtosisOfNumericAtts","1184" "Quartile1MeansOfNumericAtts","1184" "Quartile1MutualInformation","1184" "Quartile1SkewnessOfNumericAtts","1184" "Quartile1StdDevOfNumericAtts","1184" "Quartile2AttributeEntropy","1184" "Quartile2KurtosisOfNumericAtts","1184" "Quartile2MeansOfNumericAtts","1184" "Quartile2MutualInformation","1184" "Quartile2SkewnessOfNumericAtts","1184" "Quartile2StdDevOfNumericAtts","1184" "Quartile3AttributeEntropy","1184" "Quartile3KurtosisOfNumericAtts","1184" "Quartile3MeansOfNumericAtts","1184" "Quartile3MutualInformation","1184" "Quartile3SkewnessOfNumericAtts","1184" "Quartile3StdDevOfNumericAtts","1184" "REPTreeDepth1AUC","1180" "REPTreeDepth1ErrRate","1180" "REPTreeDepth1Kappa","1180" "REPTreeDepth2AUC","1180" "REPTreeDepth2ErrRate","1180" "REPTreeDepth2Kappa","1180" "REPTreeDepth3AUC","1180" "REPTreeDepth3ErrRate","1180" "REPTreeDepth3Kappa","1180" "RandomTreeDepth1AUC","1180" "RandomTreeDepth1ErrRate","1180" "RandomTreeDepth1Kappa","1180" "RandomTreeDepth2AUC","1180" "RandomTreeDepth2ErrRate","1180" "RandomTreeDepth2Kappa","1180" "RandomTreeDepth3AUC","1180" "RandomTreeDepth3ErrRate","1180" "RandomTreeDepth3Kappa","1180" "StdevCardinalityOfNominalAttributes","161" "StdevCardinalityOfNumericAttributes","161" "StdvNominalAttDistinctValues","1184" "kNN1NAUC","1176" "kNN1NErrRate","1176" "kNN1NKappa","1176"

emrysshevek commented 6 years ago

Sorry, I meant to say those were the ones OpenML actually uses that we don't use. Glad to see that the full list is the same as the one I got from looking at a bunch of datasets though