Closed fspereira1 closed 2 years ago
I will try doing some digging. A first thing I notice (not sure that this will solve the problem) is that, looking at your script, it seems you are using a pretty outdated version of Oceananigans (something like 0.71 or lower). I would try running with an updated versions and see if the problem persists
Sure, but how can I do that? Is there something on the script to change the Oceananigans version, or did you see that through the options in the script? My apologies if this question is basic, but I am a new user.
No problem, you have a line in the script that would not work with the current Oceananigans version: the keyword arguments of LinearEquationOfState
(α
, β
) were renamed to thermal_expansion
and haline_contraction
in Oceananigans 0.71.5.
To update Oceananigans, you can type
import Pkg
Pkg.update("Oceananigans")
this should give you version 0.75.3 If you want to try with the latest version (0.77.5) you can do
Pkg.add(Pkg.PackageSpec(name = "Oceananigans", version = "0.77.5"))
If you are using a GPU, I would remain on 0.75.3 at the moment because in 0.77.5 there are problems with GradientBoundaryCondition
s on the GPU.
Probably it will not fix your issue, but a lot has changed internally so it is worth the try
(remember to change those keyword arguments before running, otherwise you will get an error)
[17:11]fspereira@ch-fe2[/lustre/scratch5/fspereira/OCEANANIGANS/test/case1]# julia () | Documentation: https://docs.julialang.org () | () () | | |_ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ ` | | | | || | | | (| | | Version 1.6.7 (2022-07-19) / |_'|||_'_| | Official https://julialang.org/ release |/ |
julia> import Pkg
julia> Pkg.status("Oceananigans")
Status ~/.julia/environments/v1.6/Project.toml
[9e8cae18] Oceananigans v0.68.7
julia> Pkg.update("Oceananigans")
Updating registry at ~/.julia/registries/General
No Changes to ~/.julia/environments/v1.6/Project.toml
No Changes to ~/.julia/environments/v1.6/Manifest.toml
julia> Pkg.status("Oceananigans")
Status ~/.julia/environments/v1.6/Project.toml
[9e8cae18] Oceananigans v0.68.7
It does not update. Do I need a different version of Julia? If I remember correctly, the website says that 1.6.7 is ok
It should be ok with Julia 1.6, try removing it with
Pkg.rm("Oceananigans")
if that doesn't work you can always force a version with
Pkg.add(Pkg.PackageSpec(name = "Oceananigans", version = "0.77.5"))
Thank you @simone-silvestri. We are using LESbrary.jl for calculating turbulent statistics as shown in the script (lines 288-300) shared by @fspereira1. Even with the latest version of Oceananigans.jl v0.76.1,
while adding the LESbrary package, (Pkg.add(url="https://github.com/CliMA/LESbrary.jl.git")), the Oceananigan.jl version automatically changes to v0.68.7
that is stable with the LESbrary package. To run our script or any example script with v0.68.7
, we have to change thermal_expansion
and haline_contraction
back to $\alpha$
and $\beta$
. For a similar reason filename
and filepath
are also switched in the script.
Is there any other package for the estimation of turbulent statistics (second and third-order statistics along with sub-grid-scale fluxes) with the latest version of Oceananigans.jl instead of calculating within the script (commented lines 260-284)?
Ah I see. Well from what I see everything in the TurbulentStatistic.jl module is compatible with new versions of Oceananigans (except maybe GPU usage). Since that is what you are using, you can maybe use it locally?
This is just a quick fix to try out the new Oceananigans. Also, to try out if the problem persists it is enough to test some simple second-order moments. The ones you have in your script should do the job
u, v, w = model.velocities
t = model.tracers.T
U = Average(u, dims=(1, 2))
V = Average(v, dims=(1, 2))
T = Average(t, dims=(1, 2))
wu = Average(w * u, dims=(1, 2))
wv = Average(w * v, dims=(1, 2))
uu = Average(u * u, dims=(1, 2))
vv = Average(v * v, dims=(1, 2))
ww = Average(w * w, dims=(1, 2))
www = Average(w * w * w, dims=(1, 2))
wT = Average(w * t, dims=(1, 2))
uv = Average(u * v, dims=(1, 2))
uT = Average(u * t, dims=(1, 2))
vT = Average(v * t, dims=(1, 2))
TT = Average(t * t, dims=(1, 2))
Hi All,
I rerun the simulations using the newest version of the code,
julia> [17:03]fspereira@ch-fe1[/lustre/scratch5/fspereira/OCEANANIGANS/test/case5]# julia () | Documentation: https://docs.julialang.org () | () () | | |_ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ ` | | | | || | | | (| | | Version 1.6.7 (2022-07-19) / |_'|||_'_| | Official https://julialang.org/ release |/ |
julia> import Pkg
julia> Pkg.status("Oceananigans")
Status ~/.julia/environments/v1.6/Project.toml
[9e8cae18] Oceananigans v0.77.5
and a script based on the one available on oceananigans webpage (I only changed the grid size, constant, and set the random seed. I also tried without these changes):
https://github.com/CliMA/Oceananigans.jl/blob/main/examples/ocean_wind_mixing_and_convection.jl
Unfortunately, the new code/script led to the same reproducibility problem. I ran 4 simulations using the same script (attached) and obtained 4 different average ww profiles.
Any ideas or suggestions?
Thanks for the hassle of rerunning on the new version. Are the initial conditions exactly the same? (I guess yes since you fix the random seed, but it's good to have the certainty)
@simone-silvestri, yes, the initial conditions are exactly the same as in the script.
I suspect the issue is your use of function-based boundary conditions. If you initialize with a function, we launch a kernel that computes the function at every grid cell. However, I don't think that the ordering of the loop is guaranteed to be deterministic. If so, this non-deterministic loop ordering is a feature of KernelAbstractions. Whether or not this is a CPU vs GPU issue, I'm not sure. But I don't think that we get deterministic thread ordering on the GPU without extra effort.
To obtain determininstic initial conditions, you could try using arrays to set the initial conditions instead. Your script contains these lines:
rng = MersenneTwister(1234)
Random.seed!(rng, 1414)
Ξ(z) = randn(rng) * z / model.grid.Lz * (1 + z / model.grid.Lz)
Tᵢ(x, y, z) = 20 + dTdz * z + dTdz * model.grid.Lz * 1e-6 * Ξ(z)
uᵢ(x, y, z) = sqrt(abs(Qᵘ)) * 1e-3 * Ξ(z)
set!(model, u=uᵢ, w=uᵢ, T=Tᵢ, S=35)
You can try replacing these lines with something like
Random.seed!(1414)
T = model.tracers.T
u, v, w = model.velocities
x, y, z = nodes(T, reshape=true)
Lz = model.grid.Lz
shape = @. z / Lz * (1 + z / Lz)
ΞT = randn(size(T)...) *. shape
Ξu = randn(size(u)...) *. shape
Ξw = randn(size(w)...) *. shape
Tᵢ = @. 20 + dTdz * z + dTdz * Lz * 1e-6 * ΞT
uᵢ = @. sqrt(abs(Qᵘ)) * 1e-3 * Ξu
wᵢ = @. sqrt(abs(Qᵘ)) * 1e-3 * Ξw
set!(model, u=uᵢ, w=wᵢ, T=Tᵢ, S=35)
I'd be curious to know if this works.
Here are a few more tips and best practices for raising issues here:
Please reduce your code to minimum working examples. The script that was linked contains a lot of extraneous code (including comments copy/pasted from an example) that is irrelevant to the issue we are discussing. Reducing your code to a minimum example is kind to the community and will help people answer your questions faster.
If you can, avoid linking to code and instead paste your code directly into the issue.
Use github's markdown formatting (such as triple backticks, "```") to format code and julia prompt instructions
Finally, I strongly recommend using environments to manage the Oceananigans version (though I don't think this is intrinsic to the issue, it will help us rule out a lot of possible issues if we can focus on one Oceananigans version). We have written some tips in our wiki: https://github.com/CliMA/Oceananigans.jl/wiki/Productive-Oceananigans-workflows-and-Julia-environments which includes a link to the official Julia documentation.
Sounds like LESbrary.jl needs to be updated!
Thank you for all your comments. I will try those lines Just one question. Is the sintax of
ΞT = randn(size(T)...) *. shape
correct? I am getting this error message:
Warning: No xauth data; using fake authentication data for X11 forwarding. [NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf' [NVBLAS] Cannot open default config file 'nvblas.conf' [NVBLAS] Config parsed [NVBLAS] CPU Blas library need to be provided ┌ Warning: You appear to be using MPI.jl with the default MPI binary on a cluster. │ We recommend using the system-provided MPI, see the Configuration section of the MPI.jl docs. └ @ MPI ~/.julia/packages/MPI/08SPr/deps/deps.jl:15 [ Info: Oceananigans will use 16 threads ERROR: LoadError: syntax: invalid identifier name "." Stacktrace: [1] top-level scope @ /lustre/scratch5/.mdt0/fspereira/OCEANANIGANS/test/case09/c16_128_128m.jl:197 in expression starting at /lustre/scratch5/.mdt0/fspereira/OCEANANIGANS/test/case09/c16_128_128m.jl:197
Line 197 corresponds to the line above. I removed the *.shape
and the simulations are now running. Is that ok?
Oh sorry that was a typo (incorrect Julia syntax). It should be
ΞT = randn(size(T)...) .* shape
You can probably also omit the shape
component. That just reduces the noise to zero at the top and the bottom.
Thank you. I will try it and let you know if it fixes the reproducibility problem.
There is still a problem. This time with the vector size:
┌ Warning: You appear to be using MPI.jl with the default MPI binary on a cluster. │ We recommend using the system-provided MPI, see the Configuration section of the MPI.jl docs. └ @ MPI ~/.julia/packages/MPI/08SPr/deps/deps.jl:15 [NVBLAS] No Gpu available [NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf' [NVBLAS] Cannot open default config file 'nvblas.conf' [NVBLAS] Config parsed [NVBLAS] CPU Blas library need to be provided [ Info: Oceananigans will use 16 threads ERROR: LoadError: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 129 and 128") Stacktrace: [1] _bcs1 @ ./broadcast.jl:501 [inlined] [2] _bcs (repeats 3 times) @ ./broadcast.jl:495 [inlined] [3] broadcast_shape @ ./broadcast.jl:489 [inlined] [4] combine_axes @ ./broadcast.jl:484 [inlined] [5] instantiate @ ./broadcast.jl:266 [inlined] [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3}, Nothing, typeof(*), Tuple{Array{Float64, 3}, Array{Float64, 3}}}) @ Base.Broadcast ./broadcast.jl:883 [7] top-level scope @ /lustre/scratch5/.mdt0/fspereira/OCEANANIGANS/test/case13/c16_128_128m.jl:200
Also, shouldn't we set v (last line below, I added the commented v's):
`Random.seed!(1414)
T = model.tracers.T u, v, w = model.velocities
x, y, z = nodes(T, reshape=true) Lz = model.grid.Lz
shape = @. z / Lz (1 + z / Lz) ΞT = randn(size(T)...) . shape Ξu = randn(size(u)...) .* shape
Ξw = randn(size(w)...) .* shape
Tᵢ = @. 20 + dTdz z + dTdz Lz 1e-6 ΞT uᵢ = @. sqrt(abs(Qᵘ)) 1e-3 Ξu
wᵢ = @. sqrt(abs(Qᵘ)) 1e-3 Ξw
set!(model, u=uᵢ, w=wᵢ, T=Tᵢ, S=35)`
It's probably with w
; I forgot that w
has a different size in the vertical direction. You can try changing the w
initial condition with code like
xw, yw, zw = nodes(w, reshape=true)
wshape = @. zw / Lz * (1 + zw / Lz)
Ξw = randn(size(w)...) .* wshape
I'm not sure if this is where the error comes from. The stack trace / error message that you posted will tell you the specific line that is generating the error. Identifying the line that generates the error can be useful for debugging.
If you like, you can generate a minimal example, which I can then try to run to reproduce a bug (and also to debug my own code suggestions).
Also, shouldn't we set v (last line below, I added the commented v's):
You can. This is not a question about reproducibility though --- it depends on your application. Give it a shot and see how it changes your results.
Note that there will be non-zero v
in the initial condition anyways, because it has to be projected onto an incompressible velocity field (which is not guaranteed by the random data we are using).
Also, please format your code with triple backticks:
You can add syntax annotation too appropriate for Julia.
Thank you very much for your suggestions. These changes helped a lot, but the problem still remains. I ran 8 simulations until t=4h. 6 lead to similar results, and 2 to slightly different results. The last two led to equal results. Weird, no? Any ideas or suggestions?
Random.seed!(1414)
T = model.tracers.T
u, v, w = model.velocities
x, y, z = nodes(T, reshape=true)
Lz = model.grid.Lz
xw, yw, zw = nodes(w, reshape=true)
wshape = @. zw / Lz * (1 + zw / Lz)
shape = @. z / Lz * (1 + z / Lz)
ΞT = randn(size(T)...) .* shape
Ξu = randn(size(u)...) .* shape
Ξv = randn(size(v)...) .* shape
Ξw = randn(size(w)...) .* wshape
Tᵢ = @. 20 + dTdz * z + dTdz * Lz * 1e-6 * ΞT
uᵢ = @. sqrt(abs(Qᵘ)) * 1e-3 * Ξu
vᵢ = @. sqrt(abs(Qᵘ)) * 1e-3 * Ξv
wᵢ = @. sqrt(abs(Qᵘ)) * 1e-3 * Ξw
set!(model, u=uᵢ, w=wᵢ, T=Tᵢ, S=35)
Can you verify that the initial conditions are identical, before you run the simulations?
My apologies for the late reply. Yes, the initial conditions seem identical:
u=0 v=0 w=0 T diff 0 (attached - 6 cases)
Can you verify that the initial state of the model, after calling set!(model, ...)
, but before calling run!(simulation)
, are identical between the realizations? (Not just by reading the code.)
Thank you for your answer. I am not sure if I understood your comment. The tests I did were obtained after set!(model,..), and before run. I wrote the T, u, v, and w fields. They were all identical Is this what you asked me to do?
Okay, apologies. I just didn't quite understand what you meant when you said they seem identical. Typically we would just write something like T1 == T2
, which will return true
or false
, or equivalently something like all(T1 .== T2)
. Another test is to use isapprox
(also written ≈
) as in all(T1 .≈ T2)
.
Here's a bit more background on the reproducibility tests we currently have:
We have "regression tests" that test to ensure that output from a certain simulation remains identical across PRs, including tests that involve LES closures. These tests involve ~10 time steps. We conclude that results are "identical" when every grid point is within sqrt(eps(T))
, where T
is the floating point type (eg Float64
or Float32
), for example:
Many of our other tests also implicitly rely on reproducibility.
I think, therefore, that we do have reproduciblity in many cases. However, it is quite possible that your case exposes some particular feature that leads to a loss of reprodicibility. I think perhaps the next step in order to make progress is to code up a "minimal working example" (often called an MWE), which involves relentlessly simplifying the examle until we isolate the essential complication that leads to a failure of the test. With that knowledge in hand, we can dig deeper to find the underlying cause (and hopefully fix it). Often, the process of simplying a script in order to isolate the MWE also produces some insight about the issue (and potentially about the test).
My bad. I was not clear. Yes, T1 - T2 = 0
As pointed out by @kburns, if you're interested in bit reproducibility you may also need to set the FFTW plan (by default, FFTW is not reproducible even on identical architectures). I'm not sure this is your issue (I have doubts...) but if you want to be thorough you may want to check this. To set the plan you have to build the pressure solver manually with something like:
using Oceananigans.Solvers: FFTBasedPoissonSolver
using FFTW
pressure_solver = FFTBasedPoissonSolver(grid, planner_flag=FFTW.ESTIMATE)
model = NonhydrostaticModel(; grid, pressure_solver, other_kwargs...)
I'd be surprised if "round-off errors" accumulate enough to cause the differences you're seeing (but it does seem at least possible, especially for very long runs at relatively coarser resolutions where slight differences in the eddy diffusivity might lead to slightly different turbulent trajectories). Hope that helps.
Thank you for the suggestion @glwagner
I installed the FFT, and I can
using Oceananigans.Solvers: FFTBasedPoissonSolver
using FFTW
but when I try
pressure_solver = FFTBasedPoissonSolver(grid, planner_flag=FFTW.ESTIMATE)
I get
julia> pressure_solver = FFTBasedPoissonSolver(grid, planner_flag=FFTW.ESTIMATE) ERROR: MethodError: no method matching FFTBasedPoissonSolver(::RectilinearGrid{Float64, Periodic, Periodic, Bounded, Float64, Float64, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, Vector{Float64}}, CPU}; planner_flag=0x00000040) Closest candidates are: FFTBasedPoissonSolver(::Any) at /users/fspereira/.julia/packages/Oceananigans/7G5bN/src/Solvers/fft_based_poisson_solver.jl:50 got unsupported keyword argument "planner_flag" FFTBasedPoissonSolver(::Any, ::Any) at /users/fspereira/.julia/packages/Oceananigans/7G5bN/src/Solvers/fft_based_poisson_solver.jl:50 got unsupported keyword argument "planner_flag" FFTBasedPoissonSolver(::G, ::Λ, ::S, ::B, ::T) where {G, Λ, S, B, T} at /users/fspereira/.julia/packages/Oceananigans/7G5bN/src/Solvers/fft_based_poisson_solver.jl:6 got unsupported keyword argument "planner_flag" Stacktrace: [1] top-level scope @ REPL[42]:1 [2] top-level scope @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52
What should I use?
Ah sorry. I think you should use
pressure_solver = FFTBasedPoissonSolver(grid, FFTW.ESTIMATE)
PS try triple backticks (``) rather than single backticks (
) for formatting blocks of code / error messages.
Thanks, that worked. However, now I get this
Iteration: 0010, time: 1.833 minutes, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 1.577 minutes
Iteration: 0020, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 1.674 minutes
Iteration: 0030, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 1.766 minutes
Iteration: 0040, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 1.841 minutes
Iteration: 0050, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 1.942 minutes
Iteration: 0060, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 2.018 minutes
Iteration: 0070, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 2.111 minutes
Iteration: 0080, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 2.186 minutes
Iteration: 0090, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 2.261 minutes
Iteration: 0100, time: NaN years, Δt: NaN years, max(|w|) = NaN ms⁻¹, wall time: 2.349 minutes```
Oh, is your grid vertically stretched? In that case you need to use
pressure_solver = FourierTridiagonalPoissonSolver(grid, FFTW.ESTIMATE)
Yes, it is. I ran ten simulations (same .jl script) setting Tmax=4h, and I got the same profiles. It seems the reproducibility problem is solved. I'll run the simulations for four days (our goal) and let you know what happens. Thank you for your help. If this works, I can share the final script.
Great! (Thanks @kburns)
Feel free to close this issue @fspereira1 if you believe it is resolved.
@glwagner and @kburns, thank you. I will confirm if the simulations running for four days work and then close it.
Since it is a different topic, I created a new issue https://github.com/CliMA/Oceananigans.jl/issues/2787 but I wonder how I could run this case on parallel (512x512x512 grid). I noticed you just replied.
It might make sense to convert this to a discussion and change the title to "Building reproducible LES setups".
The info here could be useful for future Oceananigans users that would like to build reproducible setups (thanks for your efforts in this department @fspereira1).
Of note, the lessons learned here are mostly about achieving reproducibility with Julia and FFTW (the lessons are not Oceananigans specific, and are applicable to other Julia applications).
And to summarize the important points:
rand()
inside an initial condition function because loop ordering is not deterministic in Oceananigans). There could be special concerns for random array initial conditions on GPU.HydrostaticFreeSurfaceModel
) must use the FFTW.ESTIMATE
planner flag (see solver documentation for how to do this in each case).@glwagner This Makes sense to me. And thank you for your help. Also, I want to give you an update. I ran ten simulations with the same .jl file with T=4days. I got the same profiles (ww, www, T, etc). I can share the .jl file, it might useful to other users. What would be the best way?
Perhaps start a git repo and post a link to it? It's best to include the julia environment you're using with the file (otherwise it will go stale as Oceananigans evolves).
If you want to just post the file then you can use a gist, or copy/paste the code here if its short.
Dear Oceananigans team,
We are trying to use Oceananigans to create reference LES solutions for multiple canonical flows. We use a cubical domain and random perturbations to trigger the flow. During our validation tests, we noticed that we could not reproduce the results, i.e., running the same .jl script (same initial flow conditions) leads to different averaged solutions (see attached picture). We ran more than 16 simulations and never obtained the same solution. We tried to set the seed of the random perturbations constant, but this did not solve the problem. Do you observe this problem, and could you help us run reproducible simulations so other users can obtain the same solutions? We attached the .jl file we are using to define the simulations.
Best regards, Filipe Pereira Luke van Roekel Amrapalli Garanaik Brodie Pearson
c16_128_128m(1).jl.zip