byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

New Metafeatures #188

Open emrysshevek opened 5 years ago

emrysshevek commented 5 years ago

This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.

Metafeatures

Summarization

MichaelMMeskhi commented 5 years ago

Regarding DCoL, from what I have experienced, having it wrapped in Python would be best. Topological calculations are slow thus keeping them in C++ might be best. I have done some of the DCoL wrapping last year for personal purposes might give it a try again for this project if I find time.

emrysshevek commented 5 years ago

That would be great if you have time! Which parts have you done so far?

MichaelMMeskhi commented 5 years ago

I've personally wrapped computeNonLinearityLCDistance just for something specific I was trying to do. But I will look back at DCoL and see what to do with it. I was thinking of maybe even using Cython? Not sure how that will affect performance yet. Doing higher level math in C++ is way faster.

emrysshevek commented 5 years ago

I'm not really sure how the DCoL package works, is there a quick/easy way to just wrap a call to get all of the metrics for a given dataset?

MichaelMMeskhi commented 5 years ago

There is a way but if I remember correctly it writes the metrics to a file. Some modifications must be made in C++ before wrapping.

MichaelMMeskhi commented 5 years ago

This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.

Metafeatures

  • [ ] Radius Neighbors Graph based:

    • [x] Number of Nodes (not really necessary since it's just the number of instances)
    • [ ] Number of Edges
    • [ ] (un)weighted diameter (longest shortest path in the graph)
    • [ ] (un)weighted shortest paths (over all pairs of nodes)
    • [ ] clustering (fraction of a vertices neighbors that are also neighbors of each other)
    • [ ] degree (number of vertices adjacent to a vertex)
    • [ ] class change ratio (number of edges connecting vertices of different classes)
  • [ ] More Model-based:

    • [ ] KNN

    • [ ] Perceptron

    • [ ] Perceptron sum of weights on full dataset, 1/10 dataset, 1/2 dataset, sqrt dataset

    • [ ] Clustering

    • [ ] Bayesian Network

  • [ ] Regression

  • [ ] Timing

  • [ ] DCoL

  • [ ] Measures of overlaps in the feature values from different classes.

  • [ ] The maximum Fisher's discriminant ratio (F1).

  • [ ] The directional-vector maximum Fisher's discriminant ratio (F1v).

  • [ ] The overlap of the per-class bounding boxes (F2).

  • [ ] The maximum (individual) feature efficiency (F3).

  • [ ] The collective feature efficiency (F4).

  • [ ] Measures of class separability.

  • [ ] The leave-one-out error rate of the one-nearest neighbor classifier (L1).

  • [ ] The minimized sum of the error distance of a linear classifier (L2).

  • [ ] The fraction of points on the class boundary (N1).

  • [ ] he ratio of average intra/inter class nearest neighbor distance (N2).

  • [ ] The training error of a linear classifier (N3).

  • [ ] Measures of geometry, topology, and density of manifolds.

  • [ ] The nonlinearity of a linear classifier (L3).

  • [ ] The fraction of maximum covering spheres (T1).

  • [ ] The average number of points per dimension (T2).

  • [ ] Time-series

Summarization

  • [ ] Meta-metafeatures
  • [ ] sum
MichaelMMeskhi commented 5 years ago

Suggestion to add a new meta-feature: Concept Variation (Task complexity) as defined here.