JuliaStats / MultivariateStats.jl

A Julia package for multivariate statistics and data analysis (e.g. dimension reduction)
Other
377 stars 86 forks source link

RFC: Unified API #109

Open wildart opened 5 years ago

wildart commented 5 years ago

Following #95, I looked at MV models/methods implemented in this package, trying to figure out what would be a type hierarchy and corresponding method interfaces for this package.

Here is a table of models and corresponding function names used by models.

Function \ Model CCA WHT ICA LDA FA PPCA PCA KPCA MDS
fit x x x x x x x x x
transform x x x x x x x x x
predict x
indim x x x x x x x x
outdim x x x x x x x x x
mean x x x x x x x ?
var x x ? ? ?
cov x ?
cor x
projection x x x x x x
reconstruct x x x x
loadings ? ? x x ? ? ?
eigvals ? ? ? ? x
eigvecs ? ? ? ? ?
length
size

I put ? where a possible implementation is missing or called differently.

So, I propose following type hierarchy

@nalimilan @ararslan Thoughts?

ararslan commented 5 years ago

That makes sense to me. Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

wildart commented 5 years ago

Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

That would be AbstractDimensionalityReduction

ararslan commented 5 years ago

Whoops, don't know how I missed that...

kescobo commented 5 years ago

This seems great to me.

As my primary interest in this is for plotting, one thing I'd like to know is whether there's a common method for obtaining a vector that would be used in a plot. I'm not super knowledgeable about the terminology, but I think different things are commonly plotted for different dimensionality reductions. For MDS and PCA (I think), one is supposed to plot the eigenvectors scaled by the square of the eigenvalue.

But finding information on this has been a bit challenging for me, not knowing all of the jargon.

wildart commented 5 years ago

Loadings are scaled eigenvectors. It will be easy to add them to every eigendecomposition-based method.

nalimilan commented 5 years ago

Sounds like a good idea. Is the LinearDimensionalityReduction vs. NonlinearDimensionalityReduction useful? I guess it doesn't hurt, but in your plan it doesn't really make a difference AFAICT.

Also, shouldn't PCA implement loadings?

kescobo commented 5 years ago

Fantastic. What about things like LDA and CCA? I've definitely seen those plotted, but your schema above doesn't have loadings for those, cf.

I know this is somewhat orthogonal, I can open a separate issue if that would be useful. In any case, having unified APIs for this stuff will be fantastic.