Closed CheukHinHoJerry closed 1 year ago
I saw that package but haven't had time to look at it. How does it perform the tests?
I'm just a bit sceptical about fully automated FD tests - but would be happy to switch if we find it's robust
I saw it has a keyword argument fdm::FiniteDifferenceMethod: the finite differencing method to use
so I guess we can at least choose to use finite difference for testing. But I need some more time to see what it actually does. Anyhow currently I will still implement both the hand-coded one and the ChainRulesCore
thing is just for trial.
This is now ready if we don't care about the ChainRulesCoreTest
. I suggest we wait until the 2nd pb is merged and then we can build a simple model to test whether this works fine.
Something to be discuss is how we want to init the parameter of our LinearLayer
by default. Usually people use Xavier/Glorot Uniform if they have no idea?
To take care of the confusing feature/batch-index problem, maybe we can do something like:
LinearLayer{T} <: luxlayer
in_dim::Int
out_dim::Int
feature_first::T
end
LinearLayer(in_dim::Int, out_dim::Int) = LinearLayer(in_dim, out_dim, false)
LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer) = ( W = randn(rng, l.out_dim, l.in_dim), )
function (l::LinearLayer)(x::AbstractMatrix, ps, st)
if l.feature_first
return ps.W * parent(x), st
else
return parent(x) * transpose(ps.W), st
end
end
This looks good in principle, use feature_first
to decide whether to store W
or its transpose. But I don't like that dependong on this flag, the layer does something different. Instead I would have hoped that it just uses W
or transpose(W)
for the operation. Still feels we have a problem to resolve to distinguish matrix features from batches of vector features.
Maybe we need a quick fix here, but record this as an issue to be adressed asap?
Finally, I would probably make feature_first
a type parameter.
Finally, I would probably make feature_first a type parameter.
May I make sure what you mean is something like the above comment after edited? If yes I will make corresponding changes and open an issue to remind us we have to address this.
not quite ...
struct LinearLayer{FEATFIRST} <: luxlayer
in_dim::Int
out_dim::Int
end
LinearLayer(in_dim::Int, out_dim::Int) = LinearLayer(in_dim, out_dim, false)
LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer{true}) = ( W = randn(rng, l.out_dim, l.in_dim), )
LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer{false}) = ( W = randn(rng, l.in_dim, l.out_dim), )
(l::LinearLayer{true})(x::AbstractMatrix, ps, st) = ps.W * parent(x), st
(l::LinearLayer{false})(x::AbstractMatrix, ps, st) = parent(x) * ps.W, st
that's what I meant by making it a type parameter. The other changes you made are - for me at least - equally confusing as the original version.
I need to think about this for a bit. Please don't change anything yet.
Ok, I think what I would prefer for the current version is to implement it the way you suggest above, with a modification for the type-parameter:
LinearLayer{FEATFIRST} <: luxlayer
in_dim::Int
out_dim::Int
end
LinearLayer(in_dim::Int, out_dim::Int; feature_first=false) = LinearLayer{feature_first}(in_dim, out_dim)
LuxCore.initialparameters(rng::AbstractRNG, l::LinearLayer) = ( W = randn(rng, l.out_dim, l.in_dim), )
(l::LinearLayer)(x::AbstractVector, ps, st) = ps.W * parent(x), st
(l::LinearLayer{false})(x::AbstractMatrix, ps, st) = ps.W * parent(x), st
(l::LinearLayer{true})(x::AbstractMatrix, ps, st) = parent(x) * transpose(ps.W), st
And this must be very clearly documented and tested please. Can you add a test_linear.jl
please to do this? Thank you.
No problem - will do this later today.
ready to review and merge?
Yes. I think this is ready. Thank you.
reviewing now ...
@CheukHinHoJerry -- I've made a few small changes. Do you mind taking a final look?
I'll then merge into main. We can add docs in the other PR.
(sorry for the many mini-commits)
Looks good. I will open an issue to note that we have to take care of CachedArrays for linear layer later. Just want to make sure it works for now for summer students to use it.
This branch is for implementation of dense layer. I just copy the code from
ACEpsi.jl
for now and No testing is done yet. I will add those later.I also want to try to use this package to test the rrule. This seems to be better, cleaner and easier to use than hand-written
rrule
test:https://juliadiff.org/ChainRulesTestUtils.jl/dev/
CC @dhan-02
This should also wait until https://github.com/ACEsuit/Polynomials4ML.jl/pull/49#issue-1798017245 to be done. And then merge from there to this branch with some clean up then finally merge.