SciML / OperatorLearning.jl

No need to train, he's a smooth operator
https://operatorlearning.sciml.ai/dev
MIT License
43 stars 8 forks source link

Fourier Layer Tests #35

Open ba2tro opened 2 years ago

ba2tro commented 2 years ago

I have added a couple of simple tests for Fourier Layer and DeepONet Layer. What more tests can we add for the same? One thing that I wanted to add for test/deeponet.jl

  model1 = DeepONet((16, 22, 30), (1, 16, 24, 30), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
  parameters = params(model1)

  branch = Chain(Dense(16, 22,init=Flux.glorot_normal), Dense(22, 30,init=Flux.glorot_normal))
  trunk = Chain(Dense(1, 16, bias=false), Dense(16, 24, bias=false), Dense(24, 30, bias=false))
  model2 = DeepONet(branch, trunk)

  model1(a,sensors)
  model2(a,sensors)
  #forward pass
  @test model1(a, sensors) ≈ model2(a, sensors)

  m1grad = Flux.Zygote.gradient((x,p)->sum(model1(x,p)),a,sensors)
  m2grad = Flux.Zygote.gradient((x,p)->sum(model2(x,p)),a,sensors)

  #gradients
  @test !iszero(m1grad)
  @test !iszero(m2grad)
  @test m1grad[1] ≈ m2grad[1] rtol=1e-12
  @test m1grad[2] ≈ m2grad[2] rtol=1e-12

but the problem is, making the parameters same for model1 and model2 doesn't seem feasible here. Besides, I wanted to know how to formulate a test for training of FNO and DeepONet?

ChrisRackauckas commented 2 years ago

For the training test, let's start with regression testing. The tutorial has examples using them to solve some equations. Do that on a PDE with a known analytical solution and do a difference against the analytical solution, putting the tolerance just above the error of what you get locally. That would then trigger if the training ever gets worse. Usually taking the current loss and multiplying it by like 3 or 5 is a safe regression value.

ba2tro commented 2 years ago

Would we need a dataset in the library for that?

ba2tro commented 2 years ago

Maybe we can add a smaller version of the Burgers' equation dataset which contains just what we need for the test, because the whole data file is like 600MB

ba2tro commented 2 years ago

It has data for 2048 initial conditions at 8192 points for each😅. We can use like 100 - 200 ICs at 1024 points(just like the tutorial), would that work?

ba2tro commented 2 years ago

So I tried implementing a training test for fourier layer, I think there could be a bug here, I have followed the burgers' equation example here Code:

vars = matread("burgerset.mat")
xtrain = vars["a"][1:280, :]
xtest = vars["a"][end-19:end, :]
ytrain = vars["u"][1:280, :]
ytest = vars["u"][end-19:end, :]

grid = collect(range(0, 1, length=length(xtrain[1,:])))

xtrain = cat(reshape(xtrain,(280,1024,1)),
            reshape(repeat(grid,280),(280,1024,1));
            dims=3)
ytrain = cat(reshape(ytrain,(280,1024,1)),
            reshape(repeat(grid,280),(280,1024,1));
            dims=3)
xtest = cat(reshape(xtest,(20,1024,1)),
            reshape(repeat(grid,20),(20,1024,1));
            dims=3)
ytest = cat(reshape(ytest,(20,1024,1)),
        reshape(repeat(grid,20),(20,1024,1));
        dims=3)

xtrain, xtest = permutedims(xtrain,(3,2,1)), permutedims(xtest,(3,2,1))
ytrain, ytest = permutedims(ytrain,(3,2,1)), permutedims(ytest,(3,2,1))

train_loader = Flux.Data.DataLoader((xtrain, ytrain), batchsize=20, shuffle=true)
test_loader = Flux.Data.DataLoader((xtest, ytest), batchsize=20, shuffle=false)

layer = FourierLayer(128,128,1024,16,gelu,bias_fourier=false)

model = Chain(Dense(2,128;bias=false), layer, layer, layer, layer,
            Dense(128,2;bias=false))

learning_rate = 0.001
opt = ADAM(learning_rate)

parameters = params(model)

loss(x,y) = Flux.Losses.mse(model(x),y)
evalcb() = @show(loss(xtest,ytest))
throttled_cb = Flux.throttle(evalcb, 5)

Flux.@epochs 500 Flux.train!(loss, parameters, train_loader, opt, cb = throttled_cb)

Error:

MethodError: no method matching batched_gemm(::Char, ::Char, ::Array{ComplexF64, 3}, ::Array{ComplexF32, 3})
Closest candidates are:
  batched_gemm(::AbstractChar, ::AbstractChar, ::AbstractArray{ComplexF64, 3}, 
!Matched::AbstractArray{ComplexF64, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:137
  batched_gemm(::AbstractChar, ::AbstractChar, !Matched::ComplexF32, ::AbstractArray{ComplexF32, 3}, 
!Matched::AbstractArray{ComplexF32, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:134
  batched_gemm(::AbstractChar, ::AbstractChar, !Matched::AbstractArray{ComplexF32, 3}, 
::AbstractArray{ComplexF32, 3}) at C:\Users\user\.julia\packages\BatchedRoutines\4RDBA\src\blas.jl:137

... in eval at base\boot.jl:373 in top-level scope at Juno\n6wyj\src\progress.jl:119 in macro expansion at [Flux\qAdFM\src\optimise\train.jl:144] (https://github.com/SciML/OperatorLearning.jl/pull/35#) in at Flux\qAdFM\src\optimise\train.jl:105 in var"#train!#36" at Flux\qAdFM\src\optimise\train.jl:107 in macro expansion at Juno\n6wyj\src\progress.jl:119 in macro expansion at [Flux\qAdFM\src\optimise\train.jl:109] (https://github.com/SciML/OperatorLearning.jl/pull/35#) in gradient at Zygote\FPUm3\src\compiler\interface.jl:75 in pullback at Zygote\FPUm3\src\compiler\interface.jl:352 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at Flux\qAdFM\src\optimise\train.jl:110 in _pullback at ZygoteRules\AIbCs\src\adjoint.jl:65 in adjoint at Zygote\FPUm3\src\lib\lib.jl:200 in _apply at base\boot.jl:814 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at fourier_tests.jl:135 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at Flux\qAdFM\src\layers\basic.jl:49 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at Flux\qAdFM\src\layers\basic.jl:47 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at Flux\qAdFM\src\layers\basic.jl:47 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at [dev\OperatorLearning\src\FourierLayer.jl:115] (https://github.com/SciML/OperatorLearning.jl/pull/35#) in _pullback at Zygote\FPUm3\src\compiler\interface2.jl in _pullback at OMEinsum\EMISk\src\interfaces.jl:204 in _pullback at Zygote\FPUm3\src\compiler\interface2.jl:9 in macro expansion at [Zygote\FPUm3\src\compiler\interface2.jl] (https://github.com/SciML/OperatorLearning.jl/pull/35#) in chain_rrule at [Zygote\FPUm3\src\compiler\chainrules.jl:216] (https://github.com/SciML/OperatorLearning.jl/pull/35#) in rrule at ChainRulesCore\uxrij\src\rules.jl:134 in rrule at OMEinsum\EMISk\src\autodiff.jl:33 in einsum at OMEinsum\EMISk\src\interfaces.jl:200 in einsum at OMEinsum\EMISk\src\binaryrules.jl:98 in einsum at OMEinsum\EMISk\src\binaryrules.jl:226 in _batched_gemm at OMEinsum\EMISk\src\utils.jl:119

Both xtrain and grid are Float64 here. When I make them Float32 explicitly this error resolves and the model trains normally. I did the same to avoid it here

https://github.com/Abhishek-1Bhatt/OperatorLearning.jl/blob/c92d3ed1eca77ea61b756864bded99e6f42dc878/test/fourierlayer.jl#L34-L35

ba2tro commented 2 years ago

The test for DeepONet works fine

ChrisRackauckas commented 2 years ago

Then split out the DeepONet tests so those can merge quicker while the other ones are investigated.

codecov[bot] commented 2 years ago

Codecov Report

Merging #35 (c92d3ed) into master (9b16e02) will increase coverage by 16.43%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #35       +/-   ##
===========================================
+ Coverage   41.09%   57.53%   +16.43%     
===========================================
  Files           6        6               
  Lines          73       73               
===========================================
+ Hits           30       42       +12     
+ Misses         43       31       -12     
Impacted Files Coverage Δ
src/DeepONet.jl 60.00% <0.00%> (+20.00%) :arrow_up:
src/FourierLayer.jl 74.19% <0.00%> (+29.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9b16e02...c92d3ed. Read the comment docs.

pzimbrod commented 2 years ago

Yeah, somehow FourierLayer doesn't promote its parameters data type to match the inputs although it should. I haven't found the culprit for sure but the no. 1 suspects are the tensor multiplications:

https://github.com/SciML/OperatorLearning.jl/blob/9b16e02b68a4bc8bb7a82098cbab8bc10e50a02d/src/FourierLayer.jl#L107

https://github.com/SciML/OperatorLearning.jl/blob/9b16e02b68a4bc8bb7a82098cbab8bc10e50a02d/src/FourierLayer.jl#L115

I'm working on switching those implementations out for more specialized code anyways in #31, but the problem might well be elsewhere - that's just my best guess so far.

ba2tro commented 2 years ago

There's one last thing, for reading the data from .mat file would we need MAT.jl as one of the dependencies. Do I run add MAT when OperatorLearning env is activated to do add it to Project.toml?

pzimbrod commented 2 years ago

There's one last thing, for reading the data from .mat file would we need MAT.jl as one of the dependencies. Do I run add MAT when OperatorLearning env is activated to do add it to Project.toml?

Yep. However, I wouldn't include MAT as a package dependency, only for testing. We can either put it as a test-specific dependency in the main Project.toml, but will be deprecated in the future as described here. I would rather have a completely separate environment for tests by creating test/Project.toml where MAT is included, as advised in the linked docs above.