Open dougiesquire opened 1 year ago
Thanks for catching this @dougiesquire.
I'm not sure this is a cookbook issue. I think it's more an issue with the data itself. I don't think it's a good idea to have output defined on different regions using the same file name. I'd suggest that a good way to deal with this issue is to rename the ocean_daily_3d_u_%.nc
files in output196-output279
to something like ocean_daily_3d_u_southern_ocean_%.nc
. They can then be separated using the nc_file
argument to cc.querying.getvar
.
But I guess even then, it would still be useful to flag it so that the user knows they have to use nc_file
.
Thanks @rmholmes. I wasn't meaning to suggest that the issue is with the cookbook, but having join="exact"
as default would've saved me a bunch of time yesterday. I.e. it could be useful for helping to find/flag issues with the data.
My guess is that most uses of the cookbook are to query/load datasets that should have consistent indexes. So having join="exact"
as default could make sense - users could always override the kwarg if they want to merge inconsistent data.
But, I'm probably just not across the full range of cookbook use cases.
But yes, for fixing the specific issue with 01deg_jra55v13_ryf9091
, changing the name of the nc files sounds sensible to me. Who would be in charge of doing that?
Sorry @dougiesquire, I didn't completely take in your comment here as I'd copied my response across from the cosima-recipes issue you'd put up. I'd support a move to the stricter requirements.
I think @AndyHoggANU ran that simulation.
I agree that join="exact"
seems like a sensible default, it's a bit crazy that xarray tries to concatenate datasets like that in the first place!
Is there any time penalty with join="exact"
? ISTR folks complaining about xarray
doing time consuming checks on coordinates under some circumstances, but it may be me misremembering, or the issue may no longer be a problem.
I think the checks by compat
are more expensive than those for join
(which just tries to align dimension sizes)? Probably one of those things where it's best to just benchmark it.
Agreed. I wouldn't expect any difference in speed for data that can be joined.
As @angus-g mentioned, there are other kwargs
that can be changed to improve performance, but they require making some assumptions about the data being loaded. I don't know whether these are justified for the COSIMA data?
EDIT: see the Note here: https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets
Motivating example here: https://github.com/COSIMA/cosima-recipes/issues/229. Files with the same naming in a single experiment are on different domains depending on the
output*
directory. Should the cookbook check whether indexes are the same for data being merged/concatenated?E.g. passing
join="exact"
to theopen_mfdataset()
call within the cosima-cookbook will then return an error when indexes to be aligned are not equal. This could currently be passed through kwargs, but should it be the default?