CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
931 stars 189 forks source link

Should we use `Base.TwicePrecision` to `align_time_step`? #2321

Open glwagner opened 2 years ago

glwagner commented 2 years ago

Unfortunately I don't have an example of this on hand (and our tests don't catch it), but I've noticed occasional irregularities in output writing that are presumably due to rounding artifacts when aligning a time-step for TimeInterval. One common case is that I run a parameter sweep and one of the files is one iteration shorter than another. This is often at the end of a long run.

I think the culprit might be round-off error when computing align_time_step or schedule_aligned_Δt. Maybe using Base.TwicePrecision for some of the calculations would solve the problem?

Here's some of the code involved:

https://github.com/CliMA/Oceananigans.jl/blob/6a7ab79fc116612d8c156b069cbb60b06416bfa8/src/Simulations/run.jl#L43-L58

https://github.com/CliMA/Oceananigans.jl/blob/6a7ab79fc116612d8c156b069cbb60b06416bfa8/src/Utils/schedules.jl#L51

https://github.com/CliMA/Oceananigans.jl/blob/6a7ab79fc116612d8c156b069cbb60b06416bfa8/src/Utils/schedules.jl#L152

milankl commented 1 year ago

Maybe this helps: You could use one dt::Float32/64 for the time step in the model, and another dt_int::Int to count up the time used for output. Because with Float32

julia> maxintfloat(Float32)/3600/24
194.18074f0

after 194 days of +dt you start flipping only the last mantissa bit, e.g. dt=300f0 (5min)

julia> bitstring(300f0*3600*24*194)
"01001111100101011101110001000010"

julia> bitstring(300f0*3600*24*194 + 300f0)
"01001111100101011101110001000011"

So after about one year +dt can be rounded back

julia> bitstring(300f0*3600*24*366)
"01010000000011010101110011110111"

julia> bitstring(300f0*3600*24*366 + 300f0)
"01010000000011010101110011110111"

but because you can factor out the dt and floats being logarithmic, this is actually independent of the time step dt.

julia> bitstring(30f0*3600*24*366)
"01001110011000100010111001011000"

julia> bitstring(30f0*3600*24*366 + 30f0)
"01001110011000100010111001011000"
glwagner commented 1 year ago

I need to digest this suggestion! But in the meantime: does this work for changing dt? For idealized / strongly-time-dependent problems we often find it useful to continuously adapt the time-step according to a CFL criteria.

milankl commented 1 year ago

Whatever Δt (e.g. in seconds) you choose, in SpeedyWeather we round it to seconds, by doing something like

Δt = round(Δt)
Δt_int = convert(Int,Δt)

And have a time::DateTime object that counts up

 time += Dates.Second(Δt_int)

which essentially keeps time at infinite precision (because it uses integers) regardless the number format used. Because output requires "seconds since" we then do

Dates.value(Dates.Second(time-startdate))

which is done with integer arithmetic

julia> t0 = Dates.DateTime(2000,1,1)
2000-01-01T00:00:00

julia> t1 = Dates.DateTime(2001,1,1)
2001-01-01T00:00:00

julia> GFlops.@count_ops t1-t0
Flop Counter: 0 flop

To answer your question, yes, as long as you choose time steps rounded to seconds this works. You could also do it to milliseconds precise obviously, but I believe that's an overkill.

glwagner commented 1 year ago

We can't round the time step universally because we are unit-agnostic, so we can have non-dimensional time. For example one might set a "diffusion time-scale to t=1, then the time-step would be 1e-2 or 5e-2 for stability.

Even when we are "on Earth", we definitely do need to take millisecond time-steps; for example for large eddy simulations which could have 0.1 m spacing and fast flows. Or for direct numerical simulation, I've done some work on the laboratory scale with micron grid spacing, and very small time-steps.

But maybe there's a way to generalize the idea by choosing a "basis"? Ie we can specify the number of digits for the time-step in whatever unit system the user is employing.

What do you think about the idea of using a higher precision representation for the time-step only? We only have to do a bit of arithmetic with it (when inserted into any hot loops, we'd convert back to lower precision)...

glwagner commented 1 year ago

Maybe we can design a custom Clock object that opts-in to this kind of behavior, when warranted. The default could be our original clock, but if you know what basis to use and you'd like to avoid rounding issues for very long running simulations, you can opt-in to ExactSteppingClock or something.