JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

Initialise states based on point type already and not requiring an actual point #400

Closed kellertuer closed 3 days ago

kellertuer commented 1 month ago

Several states and similar require a point to initialise points. Itmight be a good idea to requite the constructors to also accept just the type (and default the point's type to fill that by default).

Examples

mateuszbaran commented 1 month ago

Yes, manifold + type should be enough for allocation. IIRC we don't currently have a function for that though?

kellertuer commented 1 month ago

No, it just came up today in a discussion, so I took a note. We seem to have some states here (will check the details later) that really just need the type not the actual allocated point.

I will check what best to do, my idea would be

We should then check all places, where this scheme can be applied. I will check this later unless you feel adventurous to propose something, that would of course also be great.

edit: and maybe we also get more feedback here – I think it was Dimitri who told me about that but I do not know his GitHub handle.

kellertuer commented 1 month ago

Just to ping him here, it was @bvdmitri who reported this – besides many really nice remarks on how nice Manopt is – thanks again also for the nice discussions!

bvdmitri commented 1 month ago

@kellertuer Thanks for all the great work you're doing!

kellertuer commented 1 month ago

@bvdmitri thanks for your kind words. I now checked all of Manopt and only found one place where we allocate a p but do not use that allocated point afterwards (but only its type).

Could you maybe elaborate in which other cases you encountered that memory was allocated but the actual “data” not used?

bvdmitri commented 1 month ago

I myself don't recall many places :) I think my main frustration was with DirectionUpdateRule and the way it needs to be instantiated. So currently, I need to do something like this:

result = gradient_descent(M, f, g, p, direction = AverageGradient(M, p))

While I would expect to write something like:

result = gradient_descent(M, f, g, p, direction = AverageGradient())

I understand why I need to pass M and p into the constructor of AverageGradient from a programming point of view—it sort of initializes the state. But from a user point of view passing M and p again seems completely unnecessary. This information should be available inside gradient_descent anyway, and gradient_descent could prepare all the necessary state for AverageGradient without asking the user to do that. Now the user needs to pass the state directly to the constructor. Moreover, passing different manifolds M to gradient_descent and AverageGradient is certainly a mistake, so its a potential footgun.

I think during our conversation, I was trying to say that sometimes this p passed to a DirectionUpdateRule is not even used, but in all honesty, right now, I also cannot find where I encountered that. Sorry if I wasn't clear enough during our conversation. But my initial attention was caught by the constructor and I was browsing the source code to understand why I need to do that explicitly and I got the impression that actually its not really necessary and gradient_descent could do it for me instead. I think this explains the issue better rather then my vague "p is not required".

I think you can use the factory pattern to implement AverageGradient without explicit M and p. In a few words, you can create a "factory" that would create an actual instance of the object by passing M and p later inside gradient_descent, instead of asking the user to create an actual instance themselves. I use this quite often, e.g., here. The idea is to only ask the user to specify minimal information and pass all the other arguments later when they are available in some context.

kellertuer commented 1 month ago

Now the user needs to pass the state directly to the constructor. Moreover, passing different manifolds M to gradient_descent and AverageGradient is certainly a mistake, so its a potential footgun.

Yes, but I have to initialise the internal memory somehow so that is what I need p for (and vor thoroughly copying M). For now I do not know how to circumvent this but I am always open to ideas.

Sure in principle passing a different manifold is a cause of error here and is a good reason to avoid this if we find something.

kellertuer commented 1 month ago

A main problem might be to get that non-breaking, but we could maybe run both ideas in parallel for a while before committing the older. I think I do see the advantage in the factory mode (avoid apscifying M twice) but I personally would have to think a bit how to realise that. If you have the capacities to think about hat and propose something feel free to do so. The factory pattern sounds like the right way to go here; kind of like an interims step, so that one could maybe even accept AverageGradient()(M,p) as the materialisation which is then what the old form would match to – when we try to keep it nonbreaking.

bvdmitri commented 1 month ago

I was thinking something about those lines

# Placeholder which are not important for this example
struct SomeManifold end
struct SomeOtherManifold end
function f end
function g end

struct AverageGradient{M, P, F}
    manifold::M
    point::P
    some_internal_field::F
end

# A constructor that requires `M` and `p`
AverageGradient(M, p; some_internal_field = 0) = AverageGradient(M, p, some_internal_field)

# In case the direction has already been instantiated by a user
# only double check that the manifold matches
function prepare_direction(M, p, direction::AverageGradient)
    if !isequal(M, direction.manifold)
        error("Manifolds do not match!")
    end
    # simply return the instance created by the user
    return direction
end

struct AverageGradientFactory{F}
    some_internal_field::F
end

# A constructor that does not require `M` and `p` creates a _factory_
AverageGradient(; some_internal_field = 0) = AverageGradientFactory(some_internal_field)

# In case the direction has *not* been created yet we should create it
function prepare_direction(M, p, factory::AverageGradientFactory)
    return AverageGradient(M, p; some_internal_field = factory.some_internal_field)
end

function gradient_descent(M, f, g, p; direction)
    direction = prepare_direction(M, p, direction)
    # ...
    @show direction
    return nothing
end

And then I can do

M = SomeManifold()
p = [ 0 ]

gradient_descent(SomeManifold(), f, g, p; direction = AverageGradient())
gradient_descent(SomeManifold(), f, g, p; direction = AverageGradient(M, p))

which yields identical results and also non-breaking

direction = AverageGradient{SomeManifold, Vector{Int64}, Int64}(SomeManifold(), [0], 0)
direction = AverageGradient{SomeManifold, Vector{Int64}, Int64}(SomeManifold(), [0], 0)

moreover, it double checks that manifolds are equal

gradient_descent(SomeManifold(), f, g, p; direction = AverageGradient(SomeOtherManifold(), p))
Manifolds do not match!

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] prepare_direction(M::SomeManifold, p::Vector{Int64}, direction::AverageGradient{SomeOtherManifold, Vector{Int64}, Int64})
   @ Main ./In[7]:14
 [3] gradient_descent(M::SomeManifold, f::Function, g::Function, p::Vector{Int64}; direction::AverageGradient{SomeOtherManifold, Vector{Int64}, Int64})
   @ Main ./In[15]:2
 [4] top-level scope
   @ In[21]:1
kellertuer commented 1 month ago

Very neat idea – a bit of rework for all updates, but something we should definetly do. Thanks! Might take a while until I find time for it (still travelling).

bvdmitri commented 1 month ago

No worries. There’s no need to rush this. The current setup works all fine (though yeah it requires users to input M and p again, but well it works!). My proposal is aimed at enhancing user experience, but it doesn’t introduce any critical features nor fixing any bugs.

kellertuer commented 1 month ago

Yes and it is a great idea to follow up on because it does improve user experience (something that I value a lot).

edit: one small comment – usually none of these stores the manifold, it is only needed for obtaining a point (or its type) to initialise vectors (of points) or other point storages. But then one could still error if the points types do not match. The approach is still a very good one. I will try to change directions to that and check where-else we can use the factory pattern for improved usability.

mateuszbaran commented 1 month ago

I agree that a factory pattern would improve usability. One small point though: I don't like constructors that don't return an object of the type they name. I would propose making the AverageGradient type a factory and then prepare_direction would return a new, different type, for example AverageGradientState. I think this could be considered an internal, mostly non-breaking change.

kellertuer commented 1 month ago

Yes, that one I also dislike a bit, it might still be an okay-interims solution before we do a breaking change.

But you are right, having AverageGradient as the factory (considered nonbreaking though we change internals) and using a state for the materialised variant is ok I think. I will keep that in mind once I start reworking this (and take a look whether other areas might benefit from the factory as well).