Replace `transpose` and `permutedims` with adjoints to avoid copying data every time

ablaom commented 2 years ago

User worried about optimal performance will not be able to get it with a column-oriented tabular format in any event, as MultivariateStats likes observations as columns (and by definition, observations in tables are rows). Assuming the user starts with a matrix, X, they can always avoid a copy under the current proposal by wrapping Tables.table(X'), assuming columns are observations. If rows are observations, there is no way to get efficiency (even using MultivariateStats directly) and the user can manually permute the dimensions before wrapping the adjoint of the result for use by the MLJ wrapped model. (If memory is not an issue, I presume this is generally faster than just wrapping X directly.)

See also this note in the manual.

TODO:

[x] LDA and Bayesian LDA ~~(needs https://github.com/JuliaStats/MultivariateStats.jl/issues/192)~~
[x] All other models with transpose

ablaom commented 2 years ago

Noticed that unfortunately MultivariateStats' MulticlassLDA does not support adjoints, but I've made a request.

ablaom commented 2 years ago

Resolved by #51

JuliaAI / MLJMultivariateStatsInterface.jl

Replace `transpose` and `permutedims` with adjoints to avoid copying data every time #27