Closed juliettelavoie closed 1 year ago
Have you tested it with both use cases? Can you get tasmax with both tasmin/DTR
and tas/DTR
?
I have the vague memory that the same output variable cannot be there twice in conversions.yml
. As in, I had to make a choice between relative_humidity_from_dewpoint
and relative_humidity
, because both wouldn't work at the same time.
Oh! I did not think about this!
So, if you want tasmin
and a dataset has both tas
, dtr
and tasmin
. search_data_catalogue
gives you 2 lines with tas
and dtr
as it is listed last in the yml. (It gives tasmax
if we switch the order). Which one should we prioritize ? I think tasmax
as it doesn't assume a symmetrical distribution.
I think that DTR & tas
would be the most common combination, although I see that this isn't what we had previously.
My suggestion would be to remove tasmax_from_dtr_and_tasmin
and tasmax_from_dtr_and_tasmax
from conversions.yml
, but keep everything in conversions.py
. A user that has 'tasmin' and 'dtr' would have to provide his own yaml, but could call upon the functions in conversions.py
.
tasmin_from_dtr_and_tasmax
is what we use for ESPO and info-crue-cmip6 to get tasmin
back after adjusting dtr
, so it is more common for now. I think this is probably our new norm, so it would be a bit annoying that xscen couldn't handle it without an extra file...
Just to be clear, it never fails (whether you have tas
, tasmax
or both). And if you have both, it would give the same answer regardless of the function used, so we don't need to get rid of one.
EDIT: They don't necesseraly give the same thing all the time, if tas != 0.5 *(tasmax+tasmin)
then tasmin_from_dtr_and_tas
doesn't work and in that case we should use tasmin_from_dtr_and_tasmax
(the default now). But in general, we accept the assumption tas = 0.5 *(tasmax+tasmin)
, especially if tas
is the only data available.
Update: Gabriel a raison. It doesn't work. the first function declared can't be used.
For reference, see the conversation in #88 .
So, derived variables are the keys to the Registry, which means we can't have 2 definitions of 1 variable.
Conclusion: workflow that need tasmin_from_tasmax
in clean_up
will need to call their own yaml. It shouldn't be needed in "normal" use of search_data_catalogs
.
À moins que @aulemahal ait une idée de génie ?
Malheureusement, je n'ai pas d'idée de génie. Ça demanderait une PR majeure dans intake-esm
pour changer le registre de variables dérivées en autre chose qu'un simple dictionnaire.
Le problème va être régler en créant tasmax
et tasmin
dans miranda. Cette PR deviendra inutile et on pourra garder seulement tasmax_from_dtr_and_tasmin
dans xscen.
Ça reste potentiellement un enjeu, car ça serait bien si on pouvait couvrir quelques combinaisons possibles au lieu de juste 1 seule. Je pourrais voir une PR compliquée où on modifie search_data_catalogs
pour faire:
DerivedVariable
par défaut (i.e. hurs_from_dewpoint
, tasmin_from_dtr_and_tasmax
)hurs
, tasmin_from_dtr_and_tas
) --> Répéter autant de fois qu'il existe de combinaisons possibles.pd.concat()
des résultats, car il faut que les DerivedVariable
soient gardés en mémoire dans le catalogue intake. À voir aussi comment on gèrerait une variable qui a ce qu'il faut pour être calculée de plusieurs manières.extract_dataset
devrait aussi probablement être modifié.Bref plutôt que de fermer ce PR et oublier la discussion, j'ouvrirais peut-être un Issue pour qu'on regarde nos options... un jour.
Discussion transférée dans le issue #204 pour plus tard.
Pull Request Checklist:
number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
Does this PR introduce a breaking change?
I don't think it does, but I changed the name of the conversion tasmax/min_from_dtr to make more a difference between the 4 functions.
Other information:
This is necessary because the EMDNA dataset only gives dtr and tas.