Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

new conversions #200

Closed juliettelavoie closed 1 year ago

juliettelavoie commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

I don't think it does, but I changed the name of the conversion tasmax/min_from_dtr to make more a difference between the 4 functions.

Other information:

This is necessary because the EMDNA dataset only gives dtr and tas.

RondeauG commented 1 year ago

Have you tested it with both use cases? Can you get tasmax with both tasmin/DTR and tas/DTR?

I have the vague memory that the same output variable cannot be there twice in conversions.yml. As in, I had to make a choice between relative_humidity_from_dewpoint and relative_humidity, because both wouldn't work at the same time.

juliettelavoie commented 1 year ago

Oh! I did not think about this! So, if you want tasmin and a dataset has both tas, dtr and tasmin. search_data_catalogue gives you 2 lines with tas and dtr as it is listed last in the yml. (It gives tasmax if we switch the order). Which one should we prioritize ? I think tasmax as it doesn't assume a symmetrical distribution.

RondeauG commented 1 year ago

I think that DTR & tas would be the most common combination, although I see that this isn't what we had previously.

My suggestion would be to remove tasmax_from_dtr_and_tasmin and tasmax_from_dtr_and_tasmax from conversions.yml, but keep everything in conversions.py. A user that has 'tasmin' and 'dtr' would have to provide his own yaml, but could call upon the functions in conversions.py.

juliettelavoie commented 1 year ago

tasmin_from_dtr_and_tasmax is what we use for ESPO and info-crue-cmip6 to get tasmin back after adjusting dtr, so it is more common for now. I think this is probably our new norm, so it would be a bit annoying that xscen couldn't handle it without an extra file...

Just to be clear, it never fails (whether you have tas, tasmax or both). And if you have both, it would give the same answer regardless of the function used, so we don't need to get rid of one.

EDIT: They don't necesseraly give the same thing all the time, if tas != 0.5 *(tasmax+tasmin) then tasmin_from_dtr_and_tas doesn't work and in that case we should use tasmin_from_dtr_and_tasmax (the default now). But in general, we accept the assumption tas = 0.5 *(tasmax+tasmin), especially if tas is the only data available.

juliettelavoie commented 1 year ago

Update: Gabriel a raison. It doesn't work. the first function declared can't be used.

RondeauG commented 1 year ago

For reference, see the conversation in #88 .

juliettelavoie commented 1 year ago

So, derived variables are the keys to the Registry, which means we can't have 2 definitions of 1 variable.

Conclusion: workflow that need tasmin_from_tasmax in clean_up will need to call their own yaml. It shouldn't be needed in "normal" use of search_data_catalogs.

À moins que @aulemahal ait une idée de génie ?

aulemahal commented 1 year ago

Malheureusement, je n'ai pas d'idée de génie. Ça demanderait une PR majeure dans intake-esm pour changer le registre de variables dérivées en autre chose qu'un simple dictionnaire.

juliettelavoie commented 1 year ago

Le problème va être régler en créant tasmax et tasmin dans miranda. Cette PR deviendra inutile et on pourra garder seulement tasmax_from_dtr_and_tasmin dans xscen.

RondeauG commented 1 year ago

Ça reste potentiellement un enjeu, car ça serait bien si on pouvait couvrir quelques combinaisons possibles au lieu de juste 1 seule. Je pourrais voir une PR compliquée où on modifie search_data_catalogs pour faire:

  1. Recherche avec les DerivedVariable par défaut (i.e. hurs_from_dewpoint, tasmin_from_dtr_and_tasmax)
  2. Recherches 2 à X avec les combinaisons supplémentaires (hurs, tasmin_from_dtr_and_tas) --> Répéter autant de fois qu'il existe de combinaisons possibles.
  3. Je ne crois pas qu'on puisse faire un pd.concat() des résultats, car il faut que les DerivedVariable soient gardés en mémoire dans le catalogue intake. À voir aussi comment on gèrerait une variable qui a ce qu'il faut pour être calculée de plusieurs manières.
  4. extract_dataset devrait aussi probablement être modifié.
RondeauG commented 1 year ago

Bref plutôt que de fermer ce PR et oublier la discussion, j'ouvrirais peut-être un Issue pour qu'on regarde nos options... un jour.

juliettelavoie commented 1 year ago

Discussion transférée dans le issue #204 pour plus tard.