Closed KirillZubov closed 2 months ago
With linear
do you expect linear(y)
where y
is the output of the dot product, aka dropdims(sum(...))
or you want it applied on sum(...)
without collapsing the first dim?
We need sum
only for embedding dims - inner feature representation, which is the dot product branch
and trunk
for DeepONet.
If we use the linear at last layer, we don't need dropping dim with embedding because linear transformation do it.
@avik-pal
embbeding_size = 10
branch = Lux.Chain(
Lux.Dense(3, 20, Lux.tanh_fast),
Lux.Dense(20, 20, Lux.tanh_fast),
Lux.Dense(20, embbeding_size))
a = rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(a, θ, st)
trunk = Lux.Chain(
Lux.Dense(2, 20, Lux.tanh_fast),
Lux.Dense(20, 20, Lux.tanh_fast),
Lux.Dense(20, embbeding_size, Lux.tanh_fast))
a = rand(2, 1, 1, 30)
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(a, θ, st)
out_ = b_out .* t_out;
# julia > size(out_)
# (embbeding_size, 50, 40, 30)
# here we sum embbeding inner representation
out = sum(out_ , dims = 1)
# julia > size(out)
# (1, 50, 40, 30)
output_size = 3
linear = Lux.Dense(10, output_size)
θ, st = Lux.setup(Random.default_rng(), linear)
l_out, st = linear(out_, θ, st)
#julia > size(l_out)
#(output_size, 50, 40, 30)
ps Option for the layer at the end can be used for many other purposes (for example Fourier feature embeddings an many other) not only for the multi output that I described here as issue. So in generally I think it will be good thing for interface.
Hi @KirillZubov, I'm not sure if the above implementation of DeepONet is completely correct. The branch net takes the input $u$ at different locations, let's say $\vec{x}$, so $u(\vec{x})$ will be defined at multiple location points. Let's assume for now that we have only single output, i.e., $u(\vec{x})$ is a scalar.
In case of multiple outputs, $U = u_1, u_2, u_3,..., u_O$, we can learn the approximation for each of the functions independently from each other, much like what happens in NeuralPDE.jl
where cases of multiple functions are learned independently. We pass a vector of chain
s instead of one.
The input of the branch net is then a vector. For a batch size of nb
and m
sampled points, the input would be m x nb
. For $x \in \R^d$, the input to trunk net would be d x N x nb
if we have N
points to evaluate the function, the points at which we put sensors.
The output of branch net would then be embedding_size x nb
and that of trunk net would be embedding_size x N x nb
. We'd then want to do a dot product along the first axis or can approach the same using a matmul operation. The DeepONet would then output the function evaluations at the sensor points, that is, the output would be of the size embedding_size x nb
.
This is how I understand the DeepONets work. If this is correct, the out
or out_
in the above code are not correct.
I've quite fast written this script above just only for show how to use linear layer. My point here is not about how to calculate the vector inside. It is about support multiple output for DeepOnet. If you think that isn't need here - ok.
Yes we can use batch of NNs for each output independently but also can use one NN for all output. Here example https://github.com/SciML/NeuralPDE.jl/blob/a57b966df8e172f7e0eebf9ec0c2d94b6eaf596e/test/NNODE_tests.jl#L137-L153
DeepOnet is not about what size vectors. It about how to learn operator from sensor and locations information with branch-trunk architecture, other is flexible. It would be better to strive to make an implementation that could work with the widest possible range of extensions and not be strictly limited. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=deeponet&btnG=
I can implement this feature myself by the way.
I see. The linear
layer wasn't there in the vanilla DeepONet
so I wasn't much aware about it. Vector sizes were just a quick way to check but I got your point.
embbeding_size = 10
batch_size = 8
branch = Lux.Chain(
Lux.Dense(3, 20, Lux.tanh_fast),
Lux.Dense(20, 20, Lux.tanh_fast),
Lux.Dense(20, embbeding_size))
trunk = Lux.Chain(
Lux.Dense(6, 20, Lux.tanh_fast),
Lux.Dense(20, 20, Lux.tanh_fast),
Lux.Dense(20, embbeding_size, Lux.tanh_fast))
u_dims = 4 # 4-dimensional function points
u = rand(3, u_dims, batch_size) # rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(u, θ, st)
size(b_out)
# (10, 4, 8)
# embedding_size x input_dims x batch_size
N = 16 # sensor points
x = rand(6, N, batch_size) # 6-dimensional points
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(x, θ, st)
size(t_out)
# (10, 16, 8)
# embedding_size x sensor_dims x batch_size
out_ = batched_mul(permutedims(b_out, [2,1,3]), t_out)
size(out_)
# (4, 16, 8)
# input_dims x sensor_dims x batch_size
We can probably use the linear
layer after this, but if it's not given by the user, we can just do dropdims
along the dimensions 1 or 2 if the sizes are 1.
Why do you want use permutedims(b_out, [2,1,3])
it is limited to 3dim output only. It is already fine implement by @avik-pal with https://github.com/LuxDL/LuxNeuralOperators.jl/blob/44c39bb6d39443fb4f1a80bcbd8a28e643b63397/src/deeponet.jl#L130
But all another is fine, additional last layer
(better is call it like additional cause it can be any layer not only linear ) as optional. I already implemented it in my tiny DeepONet implementation.
https://github.com/SciML/NeuralPDE.jl/commit/30a5134e20862a7159f87e55010bbea73d0ed061#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d
Hi, Maybe I'm missing something out but I don't completely agree with the current implementation (though my above implementation wasn't correct either). For an eg. case, the following errors
u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb
branch = Lux.Chain(
Lux.Dense(64, 32, Lux.tanh_fast),
Lux.Dense(32, 32, Lux.tanh_fast),
Lux.Dense(32, embbeding_size))
trunk = Lux.Chain(
Lux.Dense(6, 20, Lux.tanh_fast),
Lux.Dense(20, 20, Lux.tanh_fast),
Lux.Dense(20, embbeding_size, Lux.tanh_fast))
model = DeepONet(branch, trunk)
ps, st = Lux.setup(Random.default_rng(), model)
first(model((u, y), ps, st))
This works when the broadcasting dimensions are same, which as we see here, won't always be true.
I think I finally got the high dimensional case. We have a function u
we want to approximate that outputs a tensor of shape u_size
given an input tensor y
of shape y_size
.
Let's have m
sensor points and N
points where we want to evaluate it. For a batch size of nb
, we have
### branch
u => b
[m x u_size... x nb] => [p x u_size x nb]
### trunk
y => t
[N x y_size... x nb] => [N x p x nb]
We leave it to the user to determine how to transform y
to a vector of length p
through some network. The final output does not depend on the size of y
.
t_ = permutedims(t, [2, 1, 3]) # p x N x nb
t_dot = reshape(t_, size(t_,1), ones(eltype(u_size), u_size), size(t_)[2:end])
# p x (1,1,1,...) x N x nb
b_dot = reshape(b, size(b,1), u_size, 1, size(b)[end])
# p x u_size x N x nb
dropdims(sum(t_dot .* p_dot; dims = 1), 1)
# u_size x N x nb
There is also a restriction on the dimensions of the outputs from the trunk, unless it implies that the user should take care of the dimensions of the trunk. While it is reasonable, because we can have any representation of the latent state where we do the dot product, but then the reshape at
https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L129
should not be required, or one of the argchecks
at
https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L124
should not have been there. Also, maybe me but this way did not seem much intuitive to me as well.
ok, got it
u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb
To solve this do it on a case by case basis:
If broadcasting is possible then use broadcast and reduce, that should be more performant than doing a permutedims (it forces a copy which is expensive and doubles the memory footprint)
If not then permutedims and batched_mul.
Now the output only for one feature solution with dropping one dim (1, 10, 5) to (10, 5). It is not dim for multiple output. It need additional dim
(output_dim, 10, 5)
My suggestion and easy way is add linear layer as option, how it has done heresrc/neural_operators.jl
https://github.com/SciML/NeuralPDE.jl/pull/806/commits/30a5134e20862a7159f87e55010bbea73d0ed061#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d