New Metafeatures - Githubissues

byu-dml / metalearn

BYU's python library of useable tools for metalearning

MIT License

22 stars 6 forks source link

New Metafeatures #188

Open emrysshevek opened 5 years ago

emrysshevek commented 5 years ago

This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.

Metafeatures

[ ] Radius Neighbors Graph based:
- [x] Number of Nodes (not really necessary since it's just the number of instances)
- [ ] Number of Edges
- [ ] (un)weighted diameter (longest shortest path in the graph)
- [ ] (un)weighted shortest paths (over all pairs of nodes)
- [ ] clustering (fraction of a vertices neighbors that are also neighbors of each other)
- [ ] degree (number of vertices adjacent to a vertex)
- [ ] class change ratio (number of edges connecting vertices of different classes)
[ ] More Model-based:
- [x] ~~KNN~~ (no longer going to implement)
- [ ] Perceptron ( on full dataset, 1/10 dataset, 1/2 dataset, sqrt dataset)
- [ ] sum of weights
- [ ] distribution of weights
- [ ] distribution of biases
- [ ] number of iterations to convergance
- [ ] Clustering
- [ ] Bayesian Network
[ ] Timing
[ ] DCoL
- [ ] Regression
- [ ] Classification
[ ] Time-series

Summarization

[ ] Meta-metafeatures
[ ] sum

MichaelMMeskhi commented 5 years ago

Regarding DCoL, from what I have experienced, having it wrapped in Python would be best. Topological calculations are slow thus keeping them in C++ might be best. I have done some of the DCoL wrapping last year for personal purposes might give it a try again for this project if I find time.

emrysshevek commented 5 years ago

That would be great if you have time! Which parts have you done so far?

MichaelMMeskhi commented 5 years ago

I've personally wrapped computeNonLinearityLCDistance just for something specific I was trying to do. But I will look back at DCoL and see what to do with it. I was thinking of maybe even using Cython? Not sure how that will affect performance yet. Doing higher level math in C++ is way faster.

emrysshevek commented 5 years ago

I'm not really sure how the DCoL package works, is there a quick/easy way to just wrap a call to get all of the metrics for a given dataset?

MichaelMMeskhi commented 5 years ago

There is a way but if I remember correctly it writes the metrics to a file. Some modifications must be made in C++ before wrapping.

MichaelMMeskhi commented 5 years ago

This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.

Metafeatures

[ ] Radius Neighbors Graph based:

[x] Number of Nodes (not really necessary since it's just the number of instances)

[ ] Number of Edges

[ ] (un)weighted diameter (longest shortest path in the graph)

[ ] (un)weighted shortest paths (over all pairs of nodes)

[ ] clustering (fraction of a vertices neighbors that are also neighbors of each other)

[ ] degree (number of vertices adjacent to a vertex)

[ ] class change ratio (number of edges connecting vertices of different classes)

[ ] More Model-based:

[ ] KNN

[ ] Perceptron

[ ] Perceptron sum of weights on full dataset, 1/10 dataset, 1/2 dataset, sqrt dataset

[ ] Clustering

[ ] Bayesian Network

[ ] Regression

[ ] Timing

[ ] DCoL

[ ] Measures of overlaps in the feature values from different classes.

[ ] The maximum Fisher's discriminant ratio (F1).

[ ] The directional-vector maximum Fisher's discriminant ratio (F1v).

[ ] The overlap of the per-class bounding boxes (F2).

[ ] The maximum (individual) feature efficiency (F3).

[ ] The collective feature efficiency (F4).

[ ] Measures of class separability.

[ ] The leave-one-out error rate of the one-nearest neighbor classifier (L1).

[ ] The minimized sum of the error distance of a linear classifier (L2).

[ ] The fraction of points on the class boundary (N1).

[ ] he ratio of average intra/inter class nearest neighbor distance (N2).

[ ] The training error of a linear classifier (N3).

[ ] Measures of geometry, topology, and density of manifolds.

[ ] The nonlinearity of a linear classifier (L3).

[ ] The fraction of maximum covering spheres (T1).

[ ] The average number of points per dimension (T2).

[ ] Time-series

Summarization

[ ] Meta-metafeatures

[ ] sum

MichaelMMeskhi commented 5 years ago

Suggestion to add a new meta-feature: Concept Variation (Task complexity) as defined here.