Inputs arguments that have no cf standard names

rcaneill commented 2 years ago

The followings inputs don't have a standard name. This is a problem for the accessor / the auto-detection. So we either need to open a request to add a standard name in the cf convention (let's call it solution 0) (I'll be happy to do it, but I have no clue how to do), or find another solution. Among the possible other solutions, I see:

Use a cf-xarray custom criteria: https://cf-xarray.readthedocs.io/en/latest/custom-criteria.html

Let the user decide the name of the variables corresponding to the needed inputs, e.g.

gsw.set_non_cf_name({'entropy':'entropy_name_in_ds', 'SA_seaice':'salinity_of_sea_ice_name_in_ds'})

And here is the list:

Rt
SA_bulk
SA_seaice
SK
SR
Sstar
axis
entropy
geo_strf
geo_strf_dyn_height
h
h_bulk
h_pot_bulk
interp_method
max_dp
p_deep
p_ref
p_shallow
pot_enthalpy_ice
pt0
pt0_ice
saturation_fraction
sea_surface_geopotential
t68
t_Ih
t_seaice
w_Ih
w_seaice

To compare, here is the list of arguments that have a standard name:

C
CT
SA
SP
lat
lon
p
pt
rho
t
z

DocOtak commented 2 years ago

Lets do another look over the CF standard name table, there are some names for a few of these:

SK -> sea_water_knudsen_salinity
SR -> sea_water_reference_salinity
Sstar -> sea_water_preformed_salinity
h -> sea_water_specific_potential_enthalpy (?)
p_ref -> reference_pressure (?)

Some of these wont have standard names and they probably won't be accepted if proposed:

t68 (CF isn't this specific)
axis (this is related to numpy machinery I think)

rcaneill commented 1 year ago

Thanks for the double check, I'll add the standard names.

I should have been more careful, as some of the arguments of gsw functions are indeed arguments (usually optional), and not physical quantities.

Physical quantities without a standard name (or at least, that I haven't implemented/seen):	Argument name	Used in	Default Value
Rt	`gsw.SP_salinometer(Rt, t)`		"C(SP,t_68,0)/C(SP=35,t_68,0), unitless"
SA_bulk	frazil properties		"bulk Absolute Salinity of the seawater and ice mixture, g/kg"
SA_seaice	ice properties		"Absolute Salinity of sea ice: the mass fraction of salt in sea ice, expressed in g of salt per kg of sea ice."
entropy	`gsw.CT_from_entropy(SA, entropy)` `gsw.pt_from_entropy(SA, entropy)`		"Specific entropy, J/(kg*K)"
geo_strf	`gsw.geostrophic_velocity(geo_strf, lon, lat, p=0, axis=0)`		"geostrophic streamfunction"
geo_strf_dyn_height	`gsw.p_from_z(z, lat, geo_strf_dyn_height=0, sea_surface_geopotential=0)` `gsw.z_from_p(p, lat, geo_strf_dyn_height=0, sea_surface_geopotential=0)`	0
h	`gsw.CT_from_enthalpy(SA, h, p)` `gsw.CT_from_enthalpy_exact(SA, h, p)`		is the specific enthalpy, and not the specific potential enthalpy
h_bulk	`gsw.frazil_properties(SA_bulk, h_bulk, p)`		"bulk enthalpy of the seawater and ice mixture, J/kg"
h_pot_bulk	`gsw.frazil_properties_potential(SA_bulk, h_pot_bulk, p)` `gsw.frazil_properties_potential_poly(SA_bulk, h_pot_bulk, p)`		"bulk enthalpy of the seawater and ice mixture, J/kg" Typo in gsw doc? Should it be potential bulk enthalpy? (Matlab doc says potential)
pot_enthalpy_ice	`gsw.pt_from_pot_enthalpy_ice(pot_enthalpy_ice)` `gsw.pt_from_pot_enthalpy_ice_poly(pot_enthalpy_ice)`		"Potential enthalpy of ice, J/kg"
pt0	`gsw.gibbs_ice_pt0(pt0)` `gsw.gibbs_ice_pt0_pt0(pt0)`		"Potential temperature with reference pressure of 0 dbar, degrees C" Could maybe be found with potential temperature and reference pressure
pt0_ice	`gsw.pot_enthalpy_from_pt_ice(pt0_ice)` `gsw.pot_enthalpy_from_pt_ice_poly(pt0_ice)` `gsw.t_from_pt0_ice(pt0_ice, p)`		"Potential temperature of ice (ITS-90), degrees C" whose reference sea pressure is zero dbar
saturation_fraction	freezing properties		"Saturation fraction of dissolved air in seawater. (0..1)"
sea_surface_geopotential	`gsw.p_from_z(z, lat, geo_strf_dyn_height=0, sea_surface_geopotential=0)` `gsw.z_from_p(z, lat, geo_strf_dyn_height=0, sea_surface_geopotential=0)`	0
t68	`gsw.t90_from_t68(t68)`		Is it in situ temperature with reference scale IPTS-68? This is not cf, but we could rely on this
t_Ih	Ice properties		"In-situ temperature of ice (ITS-90), degrees C"
t_seaice	Ice melting properties		"In-situ temperature of the sea ice at pressure p (ITS-90), degrees C"
w_Ih	frazil and melting ice properties		"mass fraction of ice: the mass of ice divided by the sum of the masses of ice and seawater. 0 <= wIh <= 1. unitless."
w_seaice	`gsw.melting_seaice_into_seawater(SA, CT, p, w_seaice, SA_seaice, t_seaice)`		"mass fraction of ice: the mass of sea-ice divided by the sum of the masses of sea-ice and seawater. 0 <= wIh <= 1. unitless."

We have a small issue: t is used both for ice temperature (e.g. in gsw.ice.kappa_ice(t, p)) and for sea water. For now I was considering that 1 argument name was only linked to 1 physical quantity. We'll have to deal with that (like do a simple check to see if this is ice or sea water temperature that we need).

I think that w_seaice and w_Ih are the same. And that t_Ih and t_seaice are the same.

Arguments:

p_deep and p_shallow, used in gsw.enthalpy_diff(SA, CT, p_shallow, p_deep)
p_ref=0 when optional, sometimes mandatory
axis=0 (almost everywhere) or axis=-1 (only for gsw.distance(lon, lat, p=0, axis=-1))
interp_method='pchip'
max_dp=1.0

rcaneill commented 1 year ago

I don't know if we should consider geo_strf_dyn_height and sea_surface_geopotential as arguments or as potential variables in a dataset that we need to parse.

rcaneill commented 1 year ago

For the list of arguments, I thinks that it makes sense to force the user to provide them (if mandatory) / to use the default values if the user does not precise them. It means that it could possibly break the auto-detection in certain very specific cases, but if we raise an error explaining that some arguments are missing and that the user needs to do some steps manually, I guess it is a good option. (it should be checked if this is likely to happen, or if there will always be another path to produce the result)

DocOtak commented 1 year ago

That is a fantastic table, thanks for all this hard work.

Some thoughts:

Rt: we use this in my group because we have a salinometer, I'd be OK with this not being auto detected because it is unlikely to appear in actual oceanographic datasets that people are using, but it will appear in the raw data.
SA_bulk and SA_seaice: there is a standard name sea_ice_salinity, so we could probably request more specific names for these if we wanted.
entropy: I don't expect this to show up in oceanographic datasets, and it feels like it is here for TEOS-10 first principals.
geo_strf: geostrophic velocity is something many people want, I'd like to make this work (request a standard name if we need to)
geo_strf_dyn_height and sea_surface_geopotential: the two functions these goes into are also very often used, though the default of 0 is probably fine for most people.
The various enthalpy params (h, h_bulk, h_pot_bulk, pot_enthalpy_ice): these also feel like TEOS-10 first principal machinery and wouldn't be used by "normal oceanographers"
pt0, pt0_ice: I'm not a sea ice person so don't know how much these will be used.
saturation_fraction: this sounds important, might be worth a standard name request.
t68: there is no standard way of knowing if your temperature is t68 or t90. The closest thing we have is that "reference scale" attribute that OceanSITES uses and we have adopted. Many people use the date of the measurement, but in practice, oceanographic datasets are way more messy.
t_Ih and t_seaice: These feel... basically the same to me, one is specific, the other is ambiguous. we could probably use the standard name sea_ice_temperature for both of these.
w_Ih and w_seaice, are these also the same? CF has sea_ice_area_fraction, it would not be hard to ask for a sea_ice_mass_fraction name I think

rcaneill commented 1 year ago

I'll try to summarize the work that needs to be done:

Ask for new cf conventions:
- [ ] SA_bulk and SA_seaice
- [ ] geo_strf?
- [ ] saturation_fraction
- [ ] w_Ih and w_seaice (sea_ice_mass_fraction)
geo_strf_dyn_height and sea_surface_geopotential: keep default, and force user to provide value if they want to use another value
t68 and t90: for user to use reference scale if the want to use autodetection of arguments
Deal with the t name problem (used both for sea-water and sea-ice)
Let some variables undetected (Rt, entropy, enthalpy stuff) or let the user provide a dict with the equivalence between argument names in gsw, and variables names in their dataset

The point 1. is not urgent as it can be bypassed by 5.

I'll try to fix this issue first (= make PR #53 fully functional), and see later how to make everything work with the autodetection (PR #30 )

rcaneill commented 1 year ago

Damn it, I just realized that p is either used for sea-water pressure, or for pressure in sea-ice... And in both case it should be the pressure minus atmospheric pressure (no standard name exists for this...)

DocOtak commented 1 year ago

Sorry at my lack of response, have been at sea since... basically November. Only 10 days left in the cruise though!

Re the auto detection of calculable params, I don't have a good feel for how an API might be without playing around a little. To that end, do we have some target real world datasets that we should "support"? First one that comes to my mind is argo _prof.nc files. Also the new GO-SHIP/CCHDO format, but I have some influence of the direction of that one, so that's "cheating". I'm not a modeler and am unfamiliar with any model output datasets...

rcaneill commented 1 year ago

You are probably right, we may need to target datasets. I am not very used to observation (as you are), but I have more experience with models outputs (which is good because we are complementary :) ) I'll open another issue to discuss this, see #58

DocOtak / gsw-xarray

Inputs arguments that have no cf standard names #57