byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

D3M metafeatures to implement #61

Open jaromrex opened 6 years ago

jaromrex commented 6 years ago

the following metafeatures are listed in the d3m metadata object, but not computed by metalearn: Text Token Mfs: number_of_tokens number_of_distinct_tokens number_of_tokens_containing_numeric_char ratio_of_distinct_tokens ratio_of_tokens_containing_numeric_char number_of_tokens_split_by_punctuation number_of_tokens_split_by_punctuation_containing_numeric_char number_of_distinct_tokens_split_by_punctuation ratio_of_distinct_tokens_split_by_punctuation ratio_of_tokens_split_by_punctuation_containing_numeric_char token_count_in_string_values

MFs For Values: length_of_string_values number_of_distinct_values ratio_of_distinct_values ratio_of_negative_numeric_values ratio_of_numeric_values ratio_of_numeric_valuesequal-1 ratio_of_numeric_values_equal_0 ratio_of_numeric_values_equal_1 ratio_of_positive_numeric_values ratio_of_values_containing_numeric_char number_of_negative_numeric_values number_of_numeric_values number_of_numeric_valuesequal-1 number_of_numeric_values_equal_0 number_of_numeric_values_equal_1 number_of_outlier_numeric_values number_of_positive_numeric_values number_of_values_containing_numeric_char ratio_of_values_with_leading_spaces ratio_of_values_with_trailing_spaces number_of_values_with_leading_spaces number_of_values_with_trailing_spaces target_values

Uncertain of classification: numeric_char_density natural_language_of_feature majority_class_ratio minority_class_ratio number_of_binary_features ratio_of_binary_features pearson_correlation_of_features spearman_correlation_of_features

**also, 'profile_distribution' in 'common_operations' could also compute kurtosis, and skewness (#71)

Not able to compute yet: most_common_alphanumeric_tokens most_common_numeric_tokens most_common_punctuations most_common_raw_values most_common_tokens most_common_tokens_split_by_punctuation

bold denotes that a mf has been implemented

emrysshevek commented 6 years ago

also, there's a whole category of "model-based" metafeatures that we could implement. These are things like creating a decision tree and counting the number of nodes/leaves etc

bjschoenfeld commented 6 years ago

@macetheace96 The model-based metafeatures would be good too. If you have a list of these, could you open a separate issue for them? This will help us keep each issue scoped.

@jaromrex I will make a separate issue for the skewness and kurtosis in the profile_distribution function, for the same reason I just mentioned.

MichaelMMeskhi commented 6 years ago

@macetheace96 There is a whole other categories of meta-features that I am slowly working my way up to. Most of them are broken down into two programs, the original R script 'mfe' and a separate 'mf-extractor' project on github.

sethcoast commented 5 years ago

Implemented length_of_string_features