MESMER-group / mesmer

spatially-resolved ESM-specific multi-scenario initial-condition ensemble emulator
https://mesmer-emulator.readthedocs.io/en/latest/
GNU General Public License v3.0
23 stars 18 forks source link

incorrect global mean for hfds? #205

Open mathause opened 2 years ago

mathause commented 2 years ago

The following

https://github.com/MESMER-group/mesmer/blob/bdec598daa0911b2f19f85f8894605dd10ed9dde/mesmer/io/_load_cmipng.py#L499

should probably be

-np.average(...)
+np.ma.average(...)

As it currently stands Na values are ignored (assumed 0) but not skipped.

Repro:

import numpy as np

data = np.ma.masked_invalid([1, 2, np.NaN])
weights = np.array([1, 1, 10])

a = np.average(data, weights=weights)
b = np.ma.average(data, weights=weights)

print(f"np.average:    {a}")
print(f"np.ma.average: {b}")

Returns

np.average:    0.25
np.ma.average: 1.5

the equivalent "problem" in xarray:

import numpy as np
import xarray as xr
da = xr.DataArray([1, 2, np.NaN])
weights = xr.DataArray([1, 1, 10])

a = da.weighted(weights).mean()
b = da.fillna(0).weighted(weights).mean()

print(f"average:    {a}")
print(f"fillna : {b}")

I am not sure if this is actually a problem - if it is applied the same for training and emulation? You could argue that da.fillna(0).weighted(weights).mean(("lat", "lon")) leads to the global mean HFDS while da.weighted(weights).mean(("lat", "lon")) is the ocean mean HFDS.

@leabeusch

mathause commented 2 years ago

The trend seems to differ, though:

hfds

leabeusch commented 1 year ago

Hello 😊 after a long, long time in addition to our in-person chat here finally an extended written answer as well: oh yes, that looks like I’ve been computing global HFDS, not global-ocean HFDS. I’ve definitely been doing it consistently within training & emulation, so at least no issues from that side. I feel like, the magnitude of the HFDS quantity was a topic with the MAGICC people at some point, but I cannot recall at all whether it was a conscious choice to go with truly global HFDS instead of global-ocean HFDS or an unintended consequence. I think I also faintly remember some issues with discrepancies between the global HFDS magnitude I got out of MAGICC vs the one I got from CMIP6 models. But I can’t say anymore whether it was just that the CMIP6 models showed a wide range or whether it was a consistent offset. In case MAGICC’s global HFDS is consistently larger than my CMIP6 ones’, that’d be great news from an overall consistency perspective between MAGICC’s output and CMIP6’s output. Not so great news for the implementation of MESMER I used for our GMD paper though. :S At least we concluded for global HFDS in combination with squared forced global temperature as additional predictors to not be sufficiently useful for justifying adding them to our annual mean temperature change application of MESMER. Thus, we didn’t actually combine the MESMER configuration which uses global HFDS as a predictor with MAGICC output in the paper (& hence, no MAGICC-MESMER output based on potentially differing global HFDS definitions was published).

But everything I say here should clearly be taken with a grain of salt. It’s been ~1.5 years since I last thought about these topics. ^^’ But maybe @znicholls miraculously remembers something more than me? Or is able to give a concrete statement on whether MAGICC provides global HFDS or global-ocean HFDS?

& I’m tagging @jschwaab here as well, just because I think he experimented with using global HFDS in some of his emulation work too.

mathause commented 1 year ago

I’ve definitely been doing it consistently within training & emulation

This is an important point - if it's done consistently it does not introduce an error.

znicholls commented 1 year ago

It's mainly a convention and not confusing ourselves thing. Ocean heat uptake is generally reported as global hfds because that makes it easier to compare to energy imbalances at the top of the atmosphere (which are global). So I think the way it is currently done ends up with the right answer, but we could make it much easier for ourselves to understand what is going on and documenting it.

When I write down my calculations, I usually write it like this

ocean_heat_uptake_global_sum = da.weighted(weights).sum(("lat", "lon"))  # doesn't matter if you fillna or not 
ocean_heat_uptake = ocean_heat_uptake_global_sum / global_area  # global area can either just be assumed or calculated based on the model's areacella etc.