dzt present sporadically through 1deg_jra55_ryf9091_gadi

aidanheerdegen commented 3 years ago

The 1deg_jra55_ryf9091_gadi dataset has dzt present sporadically through the data. This makes it difficult to load successfully through the cookbook

sqlite> select experiment, variables.name, ncfile, time_start, time_end, frequency 
        from experiments join ncfiles on experiment_id = experiments.id, 
                              ncvars on ncfile_id = ncfiles.id, 
                              variables on variables.id = ncvars.variable_id  
        where experiment = '1deg_jra55_ryf9091_gadi' and variables.name = 'dzt'
        order by time_start;                                                                                                      

1deg_jra55_ryf9091_gadi|dzt|output030/ocean/ocean.nc|2200-01-01 00:00:00|2210-01-01 00:00:00|1 yearly
1deg_jra55_ryf9091_gadi|dzt|output031/ocean/ocean.nc|2210-01-01 00:00:00|2220-01-01 00:00:00|1 yearly
1deg_jra55_ryf9091_gadi|dzt|output032/ocean/ocean.nc|2220-01-01 00:00:00|2230-01-01 00:00:00|1 yearly
1deg_jra55_ryf9091_gadi|dzt|output032/ocean/ocean_snap.nc|2221-01-01 00:00:00|2230-01-01 00:00:00|1 yearly
1deg_jra55_ryf9091_gadi|dzt|output051/ocean/ocean.nc|2410-01-01 00:00:00|2420-01-01 00:00:00|1 yearly
1deg_jra55_ryf9091_gadi|dzt|output051/ocean/ocean_snap.nc|2411-01-01 00:00:00|2420-01-01 00:00:00|1 yearly

Any ideas for a remedy @rmholmes? Are they all needed? if so maybe add something to the metadata notes field to let people know.

rmholmes commented 3 years ago

Thanks Aidan.

The dzt's in ocean_snap are in there because I use them for my diathermal heat budget calculations. But they're unlikely to be useful to others I guess...

Would it fix the issue if I just removed the ocean_snap.nc files in those outputs?

aidanheerdegen commented 3 years ago

Well that would fix it, but at the loss of that data being inaccessible.

I think adding something to the notes might be a decent first step. I've only had one query about it, so I don't know how often it is tripping anyone up. I also think there has to be some facility to have slightly non-standard data, and the best way to handle that is probably a note that that effect, and get people used to looking at the notes.

There is a slight issue with st_ocean also being in the ocean_heat.nc files, but the data explorer loads them ok, but if any of these other variables are present in other files they might cause an issue

temp_tendency          :: (10, 50, 300, 360) :: time tendency for tracer Conservative temperature                                     
temp_advection         :: (10, 50, 300, 360) :: cp*rho*dzt*advection tendency                                                         
temp_submeso           :: (10, 50, 300, 360) :: rho*dzt*cp*submesoscale tendency (heating)                                            
temp_vdiffuse_diff_cbt :: (10, 50, 300, 360) :: vert diffusion of heat due to diff_cbt                                                
temp_nonlocal_KPP      :: (10, 50, 300, 360) :: cp*rho*dzt*nonlocal tendency from KPP                                                 
sw_heat                :: (10, 50, 300, 360) :: penetrative shortwave heating                                                         
temp_vdiffuse_sbc      :: (10, 50, 300, 360) :: vert diffusion of heat due to surface flux                                            
sfc_hflux_pme          :: (10, 300, 360)     :: heat flux (relative to 0C) from pme transfer of water across ocean surface            
frazil_3d              :: (10, 50, 300, 360) :: ocn frazil heat flux over time step                                                   
temp_eta_smooth        :: (10, 300, 360)     :: surface smoother for temp                                                             
temp_rivermix          :: (10, 50, 300, 360) :: cp*rivermix*rho_dzt*temp                                                              
neutral_diffusion_temp :: (10, 50, 300, 360) :: rho*dzt*cp*explicit neutral diffusion tendency (heating)                              
neutral_gm_temp        :: (10, 50, 300, 360) :: rho*dzt*cp*GM stirring (heating)                                                      
temp_vdiffuse_k33      :: (10, 50, 300, 360) :: vert diffusion of heat due to K33 from neutral diffusion                              
mixdownslope_temp      :: (10, 50, 300, 360) :: cp*mixdownslope*rho*dzt*temp                                                          
temp_sigma_diff        :: (10, 50, 300, 360) :: thk wghtd sigma-diffusion heating

rmholmes commented 3 years ago

Ok. I think a good compromise is to remove the ocean_snap and ocean_wmass data in output032 and retain it in output051. I have put a note in metadata.yaml about the heat budget data in output051.

Those heat budget variables will only be present in ocean_heat.nc.

It makes sense that st_ocean is in ocean.nc and ocean_heat.nc. Both those files have variables defined on the st_ocean grid, and so they are automatically added to those files.

Note that this issue of their being different types of a variable with the same name may come up again. E.g. in my OMIP runs I have temp_global_ave variables in both monthly-average (required for OMIP submission) and scalar snapshot (part of the standard ACCESS-OM2 output now) files. I've been differentiating them in the cookbook simply by adding the ncfile=....nc command in the querying.getvar function.

rmholmes commented 3 years ago

Also note that in the OMIP runs the dzt variables is renamed dht (following the standard ACCESS-CM2 CMIP6 output). Perhaps this is not such a good idea - it may need to be changed when the data is properly archived.

aidanheerdegen commented 3 years ago

Similarly names variables aren't an issue if they have different frequency. As it turns out the st_ocean variables load ok, but have no time dimension, so aren't really that much of an issue.

The problems tend to occur when the same variables are saved with the same frequency, or if variables are saved with a gap where they aren't saved, or a combination.

I don't think there is a big issue if dzt has been renamed to dht as long as it is consistent within the dataset. There can always be something added to the notes to that effect.

As for deleting data (or moving it elsewhere) that is up to you. I have wondered in the past if there was some value in adding the functionality to instruct data files not be indexed, so they could be kept co-located, but not interfere with the database. This would be a good example of a use case.

rmholmes commented 3 years ago

Ok thanks Aidan.

The problem here is the fact that the variables saved changes along the run. But this is sensible - for an RYF run it does not make sense to save much data during the long spin-up phase. Most data (except time series of global OHC etc.) is only required for the spun-up state.

I've deleted some of the output032 data and added a note to the metadata.yaml.

aidanheerdegen commented 3 years ago

Closing this ticket, but note that the following queries work to return dzt:

cc.querying.getvar(expt='1deg_jra55_ryf9091_gadi', variable='dzt', 
                          session=session, frequency='1 yearly',
                          start_time='2200-12-31 00:00:00', 
                          end_time='2419-12-31 00:00:00', ncfile='ocean_snap.nc')

works, but time axis is discontinuous:

cc.querying.getvar(expt='1deg_jra55_ryf9091_gadi', variable='dzt', 
                          session=session, frequency='1 yearly',
                          start_time='2200-12-31 00:00:00', 
                          end_time='2419-12-31 00:00:00', ncfile='ocean.nc')

Specifying just the continuous time range:

c.querying.getvar(expt='1deg_jra55_ryf9091_gadi', variable='dzt', 
                          session=session, frequency='1 yearly',
                          start_time='2200', 
                          end_time='2230', ncfile='ocean.nc')

COSIMA / master_index

dzt present sporadically through 1deg_jra55_ryf9091_gadi #7