DeepOnet Multiple output

KirillZubov commented 3 months ago

u = ones(Float32, 10, 10, 5)
v = ones(Float32, 1, 10, 5)
deeponet = DeepONet(; branch = (10, 10, 10), trunk = (1, 10, 10))
ps, st = Lux.setup(Random.default_rng(), deeponet)

y, st_ = deeponet((u, v), ps, st)
@test size(y) == (10, 5)

Now the output only for one feature solution with dropping one dim (1, 10, 5) to (10, 5). It is not dim for multiple output. It need additional dim (output_dim, 10, 5) My suggestion and easy way is add linear layer as option, how it has done here src/neural_operators.jl https://github.com/SciML/NeuralPDE.jl/pull/806/commits/30a5134e20862a7159f87e55010bbea73d0ed061#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d

deeponet = DeepONet(; branch = (10, 10, 10), trunk = (1, 10, 10), liner =(10, output_dim))

avik-pal commented 3 months ago

With linear do you expect linear(y) where y is the output of the dot product, aka dropdims(sum(...)) or you want it applied on sum(...) without collapsing the first dim?

KirillZubov commented 3 months ago

We need sum only for embedding dims - inner feature representation, which is the dot product branch and trunk for DeepONet. If we use the linear at last layer, we don't need dropping dim with embedding because linear transformation do it.

@avik-pal


embbeding_size = 10

branch = Lux.Chain(
    Lux.Dense(3, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size))

a = rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(a, θ, st)

trunk = Lux.Chain(
    Lux.Dense(2, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

a = rand(2, 1, 1, 30)
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(a, θ, st)

out_ = b_out .* t_out;

# julia > size(out_)
# (embbeding_size, 50, 40, 30)

# here we sum embbeding inner representation 
out = sum(out_ , dims = 1)
# julia > size(out)
# (1, 50, 40, 30)

output_size = 3
linear = Lux.Dense(10, output_size)
θ, st = Lux.setup(Random.default_rng(), linear)
l_out, st = linear(out_, θ, st)

#julia > size(l_out)
#(output_size, 50, 40, 30)

ps Option for the layer at the end can be used for many other purposes (for example Fourier feature embeddings an many other) not only for the multi output that I described here as issue. So in generally I think it will be good thing for interface.

ayushinav commented 3 months ago

Hi @KirillZubov, I'm not sure if the above implementation of DeepONet is completely correct. The branch net takes the input $u$ at different locations, let's say $\vec{x}$, so $u(\vec{x})$ will be defined at multiple location points. Let's assume for now that we have only single output, i.e., $u(\vec{x})$ is a scalar.

In case of multiple outputs, $U = u_1, u_2, u_3,..., u_O$, we can learn the approximation for each of the functions independently from each other, much like what happens in NeuralPDE.jl where cases of multiple functions are learned independently. We pass a vector of chains instead of one.

The input of the branch net is then a vector. For a batch size of nb and m sampled points, the input would be m x nb. For $x \in \R^d$, the input to trunk net would be d x N x nb if we have N points to evaluate the function, the points at which we put sensors.

The output of branch net would then be embedding_size x nb and that of trunk net would be embedding_size x N x nb. We'd then want to do a dot product along the first axis or can approach the same using a matmul operation. The DeepONet would then output the function evaluations at the sensor points, that is, the output would be of the size embedding_size x nb.

This is how I understand the DeepONets work. If this is correct, the out or out_ in the above code are not correct.

KirillZubov commented 3 months ago

I've quite fast written this script above just only for show how to use linear layer. My point here is not about how to calculate the vector inside. It is about support multiple output for DeepOnet. If you think that isn't need here - ok.

Yes we can use batch of NNs for each output independently but also can use one NN for all output. Here example https://github.com/SciML/NeuralPDE.jl/blob/a57b966df8e172f7e0eebf9ec0c2d94b6eaf596e/test/NNODE_tests.jl#L137-L153

DeepOnet is not about what size vectors. It about how to learn operator from sensor and locations information with branch-trunk architecture, other is flexible. It would be better to strive to make an implementation that could work with the widest possible range of extensions and not be strictly limited. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=deeponet&btnG=

KirillZubov commented 3 months ago

I can implement this feature myself by the way.

ayushinav commented 3 months ago

I see. The linear layer wasn't there in the vanilla DeepONet so I wasn't much aware about it. Vector sizes were just a quick way to check but I got your point.

embbeding_size = 10
batch_size = 8

branch = Lux.Chain(
    Lux.Dense(3, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size))

trunk = Lux.Chain(
    Lux.Dense(6, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

u_dims = 4 # 4-dimensional function points
u = rand(3, u_dims, batch_size) # rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(u, θ, st)

size(b_out)
# (10, 4, 8)
# embedding_size x input_dims x batch_size

N = 16 # sensor points
x = rand(6, N, batch_size) # 6-dimensional points
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(x, θ, st)

size(t_out)
# (10, 16, 8)
# embedding_size x sensor_dims x batch_size

out_ = batched_mul(permutedims(b_out, [2,1,3]), t_out)
size(out_)
# (4, 16, 8)
# input_dims x sensor_dims x batch_size

We can probably use the linear layer after this, but if it's not given by the user, we can just do dropdims along the dimensions 1 or 2 if the sizes are 1.

KirillZubov commented 3 months ago

Why do you want use permutedims(b_out, [2,1,3]) it is limited to 3dim output only. It is already fine implement by @avik-pal with https://github.com/LuxDL/LuxNeuralOperators.jl/blob/44c39bb6d39443fb4f1a80bcbd8a28e643b63397/src/deeponet.jl#L130 But all another is fine, additional last layer(better is call it like additional cause it can be any layer not only linear ) as optional. I already implemented it in my tiny DeepONet implementation. https://github.com/SciML/NeuralPDE.jl/commit/30a5134e20862a7159f87e55010bbea73d0ed061#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d

ayushinav commented 3 months ago

Hi, Maybe I'm missing something out but I don't completely agree with the current implementation (though my above implementation wasn't correct either). For an eg. case, the following errors

u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb

branch = Lux.Chain(
    Lux.Dense(64, 32, Lux.tanh_fast),
    Lux.Dense(32, 32, Lux.tanh_fast),
    Lux.Dense(32, embbeding_size))

trunk = Lux.Chain(
    Lux.Dense(6, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

model = DeepONet(branch, trunk)
ps, st = Lux.setup(Random.default_rng(), model)

first(model((u, y), ps, st))

This works when the broadcasting dimensions are same, which as we see here, won't always be true.

I think I finally got the high dimensional case. We have a function u we want to approximate that outputs a tensor of shape u_size given an input tensor y of shape y_size.

Let's have m sensor points and N points where we want to evaluate it. For a batch size of nb, we have

### branch
u                      =>  b
[m x u_size... x nb]   =>  [p x u_size x nb]

### trunk
y                      =>  t
[N x y_size... x nb]   =>  [N x p x nb]

We leave it to the user to determine how to transform y to a vector of length p through some network. The final output does not depend on the size of y.

t_ = permutedims(t, [2, 1, 3]) # p x N x nb
t_dot = reshape(t_, size(t_,1), ones(eltype(u_size), u_size), size(t_)[2:end])
# p x (1,1,1,...) x N x nb

b_dot = reshape(b, size(b,1), u_size, 1, size(b)[end])
# p x u_size x N x nb

dropdims(sum(t_dot .* p_dot; dims = 1), 1)
# u_size x N x nb

There is also a restriction on the dimensions of the outputs from the trunk, unless it implies that the user should take care of the dimensions of the trunk. While it is reasonable, because we can have any representation of the latent state where we do the dot product, but then the reshape at https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L129 should not be required, or one of the argchecks at https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L124 should not have been there. Also, maybe me but this way did not seem much intuitive to me as well.

KirillZubov commented 3 months ago

ok, got it

avik-pal commented 3 months ago

u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb

To solve this do it on a case by case basis:

If broadcasting is possible then use broadcast and reduce, that should be more performant than doing a permutedims (it forces a copy which is expensive and doubles the memory footprint)
If not then permutedims and batched_mul.

SciML / NeuralOperators.jl

DeepOnet Multiple output #9