JuliaManifolds / ManifoldML.jl

Other
10 stars 1 forks source link

MLJ Tangent space transformer #5

Open mateuszbaran opened 4 years ago

mateuszbaran commented 4 years ago

At the moment it's just an initial sketch. I'm going to use it to try out the new MLJ <-> Manifolds integration: https://github.com/alan-turing-institute/MLJScientificTypes.jl/issues/46 .

codecov[bot] commented 4 years ago

Codecov Report

Merging #5 (18d1953) into master (029bba9) will decrease coverage by 76.92%. The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##            master       #5       +/-   ##
============================================
- Coverage   100.00%   23.07%   -76.93%     
============================================
  Files            3        4        +1     
  Lines           21       91       +70     
============================================
  Hits            21       21               
- Misses           0       70       +70     
Impacted Files Coverage Δ
src/ManifoldML.jl 100.00% <ø> (ø)
src/tangent_transformer.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 029bba9...ec8eb6d. Read the comment docs.

mateuszbaran commented 3 years ago

@ablaom could you take a look at this PR? This should be a reasonable first step for manifold support in MLJ. In particular TangentSpaceTransformer can be used to transform to tangent space and back and works, at least on my computer. I'm not sure how important UnivariateTangentSpaceTransformer would be? I started making it similar to Standardizer/UnivariateStandardizer transformers.

Does it go in a more or less right direction?

ablaom commented 3 years ago

Cool. Exciting to see convergence of differential geometry with MLJ!

I've had a quick look at this PR and I think understand the basic objective here.

mateuszbaran commented 3 years ago
  • You may want to consider restricting to the univariate case, ie, the case where your input X is an abstract vector of scientific type AbstractVector{<:ManifoldPoint}. The general case should be sorted by alan-turing-institute/MLJModels.jl#288, which is about wrapping univariate transformers so that they are applied for each specified feature in a table (actually, given all the work you're now perfectly poised to sort out that issue!). I'm sorry we didn't anticipate this better in initial design.

Yes, right, it would certainly make sense to centralize that functionality. That looks like a lot of work though and the current approach also works so I'd prefer to leave the multivariate functionality here for now and refactor it later.

  • since the ouput type is a tangent space, we are going to need a scientific type for tangent spaces, if this transformer is to be made available to all MLJ users. That's your plan, yes? Every registered transformer needs to have a output scientific type.

I think we have two issues here. First, I actually thought that it might make more sense to split the current functionality of TangentSpaceTransformer into transformation into a tangent space and splitting coordinates into columns. It's certainly my goal to provide both functionalities to all MLJ users. For the first part, the output scientific type already exists (Manifolds.jl has a manifold that represents a tangent space so ManifoldPoint scientific type can also be used for this). How can I do the splitting into multiple columns though? Can a registered transformer have multiple output types?

  • I think "TangentSpaceTransformer" is a perfectly fine name, but maybe "ManifoldToTangentSpaceTransformer" or "InverseTangentSpaceRetraction" are slightly more informative??

Right, ManifoldToTangentSpaceTransformer sounds better. I'll change the name.

ablaom commented 3 years ago

... so I'd prefer to leave the multivariate functionality here for now and refactor it later. 👍

... into transformation into a tangent space and splitting coordinates into columns.

Okay, I misunderstood. I thought your "tangent space" was a custom object. Rather, you identify elements of a tangent space with points in (a subspace of some) Euclidean space and you don't care to remember the base point or what subspace this is?

And you are then splitting the components of this representation into columns of a table?

And, just to be clear, the input of your Transformer can be either:

Correct?

Assuming the answer to these questions is yes, it would make sense to have specific input and output side types for a separate "univariate" transformer, well the generic transformer would have rather loose types like Table.

mateuszbaran commented 3 years ago

Okay, I misunderstood. I thought your "tangent space" was a custom object. Rather, you identify elements of a tangent space with points in (a subspace of some) Euclidean space and you don't care to remember the base point or what subspace this is?

Well, not quite. In Manifolds.jl there are no strict requirements for the representation of tangent vectors. Usually, an isometric embedding into the Euclidean space is used to represent points and a similar representation is used for tangent vectors (with a notable exception of low-rank matrices). We also have another representation for tangent vectors, that is their coefficients in a certain basis (in this transformer it's stored here: https://github.com/JuliaManifolds/ManifoldML.jl/pull/5/files#diff-dd238691731421f4d569f6b8d8496c5a1eecfd8203b6a3e98fcb077218659909R30). Also, I wasn't quite sure here what to do with storing the point at which the vectors are tangent so currently it's stored in fitresult. When I extract the "transforming into tangent space" part that point will be stored in the manifold object.

From a more high-level point of view, this transformer is supposed to be useful as a part of for example principal geodesic analysis. In general, when a standard statistical method is generalized to manifolds, one can either use the minimization-based formulation and rewrite it using geometric concepts (which is the path that, with few exceptions, requires Manopt.jl) or take a more approximate route of selecting a decent linearization of data and applying standard methods to coefficients in a basis (which is what this tangent space transformer is useful for).

And you are then splitting the components of this representation into columns of a table?

Essentially, yes. That's how I can for example use the standard PCA code to do principal geodesic analysis.

And, just to be clear, the input of your Transformer can be either:

  • a single vector of manifold points, or
  • any table with one or more columns being such vectors

Correct?

Assuming the answer to these questions is yes, it would make sense to have specific input and output side types for a separate "univariate" transformer, well the generic transformer would have rather loose types like Table.

Yes, OK, I think I know now how to improve this.

mateuszbaran commented 3 years ago

A small update: I'm planning to generalize it to chart transformers (with tangent space transformer as a special case) after https://github.com/JuliaManifolds/Manifolds.jl/pull/325 is finished. After that I'll see if there are any updates to MLJ that would let me implement it a bit nicer.

kellertuer commented 3 years ago

Cool, sorry that I did not have time to come back to ManifoldML.