Open topepo opened 7 years ago
It is the first time I hear about this, sounds quite interesting! I gave it a quick read so please correct me if I am doing something wrong.
In the ddalpha
package I need a training sample with already known classes to train a classifier, so it not unsupervised.
Did you think on something like this?
library(ddalpha)
library(rgl)
# example 1
ds <- depth.space.Mahalanobis(as.matrix(iris[1:4]), c(50, 50, 50))
plot3d(ds, col = as.numeric( iris[[5]]) )
# example 2
perm <- sample(150)
ds2 <- depth.space.Mahalanobis(as.matrix(iris[perm, 1:4]), c(50, 50, 50))
plot3d(ds2, col = as.numeric( iris[[5]][perm] ))
# example 3
clusters <- kmeans(scale(iris[1:4]), 3)
c.ord <- order(clusters$cluster)
ds3 <- depth.space.Mahalanobis(as.matrix(iris[c.ord, 1:4]), as.vector(table(clusters$cluster)))
plot3d(ds3, col = as.numeric( iris[[5]][c.ord]))
The first one is really cool, the second one not so much. One would have to supply a class vector as a parameter or some unsupervised classifier like kNN, as in the third example.
What do you think @topepo ? Is there an entirely unsupervised version of this?
caret
has a function that computes the distances of a new sample to the class centroids. I was thinking of something along the same lines although you could certainly just have an interface to generate the depths for all the data.
dimRed
has a nice interface to other dimension reduction methods and (supervised or not) these metrics would be great to include. ddalpha
is pretty good but I find the api more complex that I think it should be.
I think something like
embed(data, "DataDepth", classes = cl, ...)
where classes
can either be some vector with classes or a function that returns a vector of classes from the data should be possible. It could also accept some character vectors like "knn"
that takes the number of classes from ndim
and does some standard clustering.
I like the idea but it will probably take me a while to get to it (after v0.1.0) because I am busy with other stuff at the moment. If you want it in soon I would accept a pull request.
There should probably also be a predict
function but I am not sure how this should look like, it will probably have to accept some additional arguments.
No problem on time.
For predict
, you'll have to just save the original data (as you do in the other methods) and pass it as an argument to depth.X
.
Also, I'll send you an invite to a repo that I'll be making public soon in case you are interested in what I've been doing in regards to my previous requests. I have some of the depth parts worked out already but your interface you be better than my do.call
's
The recipes
are quite a nice idea. Why not simply make a dimRed
recipe, this would be interesting because I did not really consider data preprocessing in my package?
One of the methods you might want to add is t-SNE, this method is very good for visualization of complex data structures. Also the R package Rtsne is based on a very efficient implementation which can be used for relatively large data which is not the case for Isomap and kPCA.
I've used t-SNE a lot (back when I used to actually analyze data for a living) and like it. However, I'm constrained to using methods where the projection can be applied to new data sets (based on estimates from the old/training data).
I didn't think to make a general dimRed
step but did something similar for the depth methods. I'll put that on the list.
t-SNE works by gradient descent and in theory one can hold the old points fixed and apply it to new points only but as far as I know no one implemented it. Here is a cool package for different SNE variants: https://github.com/jlmelville/sneer I think it is not on CRAN.
You might consider adding some of Tukey's data depth methods. R has a few packages that you could wrap including
ddalpha
(see this paper gives a pretty good description of that).