dss operations within coupler pass an unused `t` argument.

akshaysridhar commented 1 year ago

Following the CTS interface, the weighted_dss_slab! function definition requires a third t (time) argument which is unused within the ClimaCore.Spaces.weighted_dss! function. We may want to remove this from our general design pattern.

LenkaNovak commented 1 year ago

@akshaysridhar @sriharshakandala @kmdeck

As @gdecker1 found earlier, Float32 runs now break in ClimaLSM due to t being specified as an FT type here. For some reason t is being returned as Float64 even if all arguments to the ODEProblem are passed as FT32. @dennisYatunin maybe you'll be familiar with the internals of this, do you think this could be an issue in ClimaTimeSteppers?

For some reason our FT32 Buildkite was set with FT64, which is why we didn't catch this earlier! Error is reproducible if the driver on this branch is run interactively.

(Note, FT32 didn't cause problems in these interactive Coupler runs before the ClimaLSM.dss! PR. )

Unless we're doing something silly here, are you all happy to remove the type specification (like we did in the coupler #387) in ClimaLSM, before this issue gets resolved in ClimaTimeSteppers? As it stands, we can't test AMIP for FT32.

akshaysridhar commented 1 year ago

Adding notes from Slack here: (Callbacks in the timestepping methods also require a (u,p,t) arg list)

dennisYatunin commented 1 year ago

You should probably just drop the unnecessary FT type restriction. The value of t is passed here from the integrator for the sake of debugging (e.g., printing @info "Value at $t: ..."); its type does not matter because it should not be used for anything else. For now, t always needs to be stored as a Float64 because Float32 does not have enough bits to accurately track time without roundoff error.

kmdeck commented 1 year ago

hi all - catching up here. is this correct: ClimaLSM needs a PR which drops the type restriction on t in dss! but which keeps it as an argument.

We may need to do this in multiple functions because we had assumed that t would be the same type as that underlying the state vectors. In the past, I thought this was a desirable feature, that we wanted the type restriction to ensure the simulation took place entirely at float32 or at float64. we can first make the above change, and see if we run into further issues?

LenkaNovak commented 1 year ago

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

kmdeck commented 1 year ago

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

We have unit tests that run with Float32 but all of our integrations (in experiments, etc) use Float64, I think. Ill make sure we can run with Float32 before merging anything!

kmdeck commented 1 year ago

Hi @kmdeck , thanks for following up on this! 🚀 That's right! Looks like removing the type specification would be the quick (and probably only) fix. Thanks @dennisYatunin for the explanation. I agree that it would be better to have consistent types, but I also get the float precision issue. Do you guys have tests that run with Float32?

We have unit tests that run with Float32 but all of our integrations (in experiments, etc) use Float64, I think. Ill make sure we can run with Float32 before merging anything!

@LenkaNovak it turns out that we have implicit requirements that t be the same float type as the state in a lot of places. so it will be a bigger change for us to get this to run. For example, whenever we prescribe time varying BC (reanalysis data, etc), the BC ends up being the same type as t, or a combination of quantities that are of type (t) and of the state FT).

what is the priority on this?

LenkaNovak commented 1 year ago

Hmm... I thought you guys don't use the t variable in the locally defined dss! function like here. All we should need is for the t::FT to be changed to just t on that line. Or is that not always the case? 🤔

kmdeck commented 1 year ago

Hmm... I thought you guys don't use the t variable in the locally defined dss! function like here. All we should need is for the t::FT to be changed to just t on that line. Or is that not always the case? 🤔

The issue is actually that we assume t is the same type as the state in many places (not just here - also in time varying boundary conditions, using renalysis data, etc). We didnt adequately test running with Float32; I think since switching to ClimaTimesteppers we lost this capability/can only run with Float64.

LenkaNovak commented 1 year ago

I see, thank you for looking into this, @kmdeck . I think Atmos has similar problems, but it was on their radar to re-enable single precision soon. As for the question on priorities, I'm not totally sure. @cmbengue @tapios How much should we prioritize Float32 runs right now?

tapios commented 1 year ago

I see, thank you for looking into this, @kmdeck . I think Atmos has similar problems, but it was on their radar to re-enable single precision soon. As for the question on priorities, I'm not totally sure. @cmbengue @tapios How much should we prioritize Float32 runs right now?

It depends on what right now means. It's not crucial, I think, for the 4-6 weeks or so. But beyond that, we'd like to be able to do coupled runs (on GPUs) in Float32.

dennisYatunin commented 1 year ago

It's also worth noting that we will probably continue to avoid using Float32 to represent t; it just leads to too much roundoff error for long simulations. The most viable options now are Float64 and maybe also DateTime (though using the latter would require changing a lot of code where we assume that t is a number). Perhaps the fastest solution for ClimaLSM would be to wrap t in FT(t) whenever it is passed from the integrator to a tendency function.

CliMA / ClimaCoupler.jl

dss operations within coupler pass an unused `t` argument. #396