ACEsuit / Polynomials4ML.jl

Polynomials for ML: fast evaluation, batching, differentiation
MIT License
12 stars 5 forks source link

Dense/Linear Layer #53

Closed CheukHinHoJerry closed 1 year ago

CheukHinHoJerry commented 1 year ago

This branch is for implementation of dense layer. I just copy the code from ACEpsi.jl for now and No testing is done yet. I will add those later.

I also want to try to use this package to test the rrule. This seems to be better, cleaner and easier to use than hand-written rrule test:

https://juliadiff.org/ChainRulesTestUtils.jl/dev/

CC @dhan-02

This should also wait until https://github.com/ACEsuit/Polynomials4ML.jl/pull/49#issue-1798017245 to be done. And then merge from there to this branch with some clean up then finally merge.

cortner commented 1 year ago

I saw that package but haven't had time to look at it. How does it perform the tests?

cortner commented 1 year ago

I'm just a bit sceptical about fully automated FD tests - but would be happy to switch if we find it's robust

CheukHinHoJerry commented 1 year ago

I saw it has a keyword argument fdm::FiniteDifferenceMethod: the finite differencing method to use so I guess we can at least choose to use finite difference for testing. But I need some more time to see what it actually does. Anyhow currently I will still implement both the hand-coded one and the ChainRulesCore thing is just for trial.

CheukHinHoJerry commented 1 year ago

This is now ready if we don't care about the ChainRulesCoreTest. I suggest we wait until the 2nd pb is merged and then we can build a simple model to test whether this works fine.

Something to be discuss is how we want to init the parameter of our LinearLayer by default. Usually people use Xavier/Glorot Uniform if they have no idea?

CheukHinHoJerry commented 1 year ago

To take care of the confusing feature/batch-index problem, maybe we can do something like:

LinearLayer{T} <: luxlayer
   in_dim::Int
   out_dim::Int
   feature_first::T
end

LinearLayer(in_dim::Int, out_dim::Int) = LinearLayer(in_dim, out_dim, false)

LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer) = ( W = randn(rng, l.out_dim, l.in_dim), )

function (l::LinearLayer)(x::AbstractMatrix, ps, st)
   if l.feature_first
      return ps.W * parent(x), st
   else
      return parent(x) * transpose(ps.W), st
   end
end
cortner commented 1 year ago

This looks good in principle, use feature_first to decide whether to store W or its transpose. But I don't like that dependong on this flag, the layer does something different. Instead I would have hoped that it just uses W or transpose(W) for the operation. Still feels we have a problem to resolve to distinguish matrix features from batches of vector features.

Maybe we need a quick fix here, but record this as an issue to be adressed asap?

Finally, I would probably make feature_first a type parameter.

CheukHinHoJerry commented 1 year ago

Finally, I would probably make feature_first a type parameter.

May I make sure what you mean is something like the above comment after edited? If yes I will make corresponding changes and open an issue to remind us we have to address this.

cortner commented 1 year ago

not quite ...

struct LinearLayer{FEATFIRST} <: luxlayer
   in_dim::Int
   out_dim::Int
end

LinearLayer(in_dim::Int, out_dim::Int) = LinearLayer(in_dim, out_dim, false)

LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer{true}) = ( W = randn(rng, l.out_dim, l.in_dim), )
LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer{false}) = ( W = randn(rng, l.in_dim, l.out_dim), )

(l::LinearLayer{true})(x::AbstractMatrix, ps, st) = ps.W * parent(x), st
(l::LinearLayer{false})(x::AbstractMatrix, ps, st) = parent(x) * ps.W, st

that's what I meant by making it a type parameter. The other changes you made are - for me at least - equally confusing as the original version.

cortner commented 1 year ago

I need to think about this for a bit. Please don't change anything yet.

cortner commented 1 year ago

Ok, I think what I would prefer for the current version is to implement it the way you suggest above, with a modification for the type-parameter:

LinearLayer{FEATFIRST} <: luxlayer
   in_dim::Int
   out_dim::Int
end

LinearLayer(in_dim::Int, out_dim::Int; feature_first=false) = LinearLayer{feature_first}(in_dim, out_dim)

LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer) = ( W = randn(rng, l.out_dim, l.in_dim), )

(l::LinearLayer)(x::AbstractVector, ps, st) = ps.W * parent(x), st
(l::LinearLayer{false})(x::AbstractMatrix, ps, st) = ps.W * parent(x), st
(l::LinearLayer{true})(x::AbstractMatrix, ps, st) = parent(x) * transpose(ps.W), st

And this must be very clearly documented and tested please. Can you add a test_linear.jl please to do this? Thank you.

CheukHinHoJerry commented 1 year ago

No problem - will do this later today.

cortner commented 1 year ago

ready to review and merge?

CheukHinHoJerry commented 1 year ago

Yes. I think this is ready. Thank you.

cortner commented 1 year ago

reviewing now ...

cortner commented 1 year ago

@CheukHinHoJerry -- I've made a few small changes. Do you mind taking a final look?

cortner commented 1 year ago

I'll then merge into main. We can add docs in the other PR.

(sorry for the many mini-commits)

CheukHinHoJerry commented 1 year ago

Looks good. I will open an issue to note that we have to take care of CachedArrays for linear layer later. Just want to make sure it works for now for summer students to use it.