Open emrysshevek opened 5 years ago
Regarding DCoL, from what I have experienced, having it wrapped in Python would be best. Topological calculations are slow thus keeping them in C++ might be best. I have done some of the DCoL wrapping last year for personal purposes might give it a try again for this project if I find time.
That would be great if you have time! Which parts have you done so far?
I've personally wrapped computeNonLinearityLCDistance
just for something specific I was trying to do. But I will look back at DCoL and see what to do with it. I was thinking of maybe even using Cython? Not sure how that will affect performance yet. Doing higher level math in C++ is way faster.
I'm not really sure how the DCoL package works, is there a quick/easy way to just wrap a call to get all of the metrics for a given dataset?
There is a way but if I remember correctly it writes the metrics to a file. Some modifications must be made in C++ before wrapping.
This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.
Metafeatures
[ ] Radius Neighbors Graph based:
- [x] Number of Nodes (not really necessary since it's just the number of instances)
- [ ] Number of Edges
- [ ] (un)weighted diameter (longest shortest path in the graph)
- [ ] (un)weighted shortest paths (over all pairs of nodes)
- [ ] clustering (fraction of a vertices neighbors that are also neighbors of each other)
- [ ] degree (number of vertices adjacent to a vertex)
- [ ] class change ratio (number of edges connecting vertices of different classes)
[ ] More Model-based:
[ ] KNN
[ ] Perceptron
[ ] Perceptron sum of weights on full dataset, 1/10 dataset, 1/2 dataset, sqrt dataset
[ ] Clustering
[ ] Bayesian Network
[ ] Regression
[ ] Timing
[ ] DCoL
[ ] Measures of overlaps in the feature values from different classes.
[ ] The maximum Fisher's discriminant ratio (F1).
[ ] The directional-vector maximum Fisher's discriminant ratio (F1v).
[ ] The overlap of the per-class bounding boxes (F2).
[ ] The maximum (individual) feature efficiency (F3).
[ ] The collective feature efficiency (F4).
[ ] Measures of class separability.
[ ] The leave-one-out error rate of the one-nearest neighbor classifier (L1).
[ ] The minimized sum of the error distance of a linear classifier (L2).
[ ] The fraction of points on the class boundary (N1).
[ ] he ratio of average intra/inter class nearest neighbor distance (N2).
[ ] The training error of a linear classifier (N3).
[ ] Measures of geometry, topology, and density of manifolds.
[ ] The nonlinearity of a linear classifier (L3).
[ ] The fraction of maximum covering spheres (T1).
[ ] The average number of points per dimension (T2).
[ ] Time-series
Summarization
- [ ] Meta-metafeatures
- [ ] sum
Suggestion to add a new meta-feature: Concept Variation (Task complexity) as defined here.
This is a list of metafeatures to be implemented for metafeature experiments and other use. This list will be updated with more specific metafeatures as more research is performed.
Metafeatures
KNN(no longer going to implement)Summarization