aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
46 stars 31 forks source link

Sbecnv followup, closes #648 #658 #668

Closed ocehugo closed 4 years ago

ocehugo commented 4 years ago

Obsoletes #648

petejan commented 4 years ago

The switch in Parser/convertSBEcnvVar.m could be moved to an external (txt or csv) file so the end-user can do the mapping from sea bird name to IMOS variable name as sea bird add new parameters.

ocehugo commented 4 years ago

@petejan, I agree but for now, this is just a small fix. To implement flexible/custom mappings we also have to implement custom functions around it (apply mappings, handle errors, convert units, apply calibration coeff, etc).

As I said in #648, IMO, the only way forward here is to unify all that sbe version mess, read the files with proper encoding (and avoid all those pesky strings) and read all params in the files in a common structure. I'm also more favourable to include a "conventioned" mfile instead of csv/txt because it's more flexible. For Example:

function sbe = sbe_mappings()
% inline functions
rescale=@(x,offset,scale)((x-offset)*scale);
self=@(x)(x)
% mappings 
% maybe follow some predefined conventions like instrument_name.version('mapped_name')
%
sbe = containers.Map();
sbe('CPHL') = {'name','CHL rescaling and renaming of first sensor','func',rescale,'args', sbedata.variables('RawCHL').data,sbedata.variables('RawCHL').offset,sbedata.variables('RawCHL').scale}};
sbe('CPHL_2') = {'name','CHL rescaling and renaming of second sensor','func',rescale,'args', sbedata.variables('Fact-CHL').data,variables('Fact-CHL').offset,sbedata.variables('Fact-CHL').scale}};
sbe('CPHL_raw') = {'name','CHL raw renaming','func',self,'args',{sbedata.variables('RawCHL').data}};
sbe('CPHL_calibrated') = {'name','CHL recalibrated','func',cphl_recalibrated,'args',{sbedata.variables('RawCHL').data,'method','xyz'}}
sbe('PSAL_teos') = {'name','PSAL computed from TEOS-10','func',gsw_SA_from_SP,'args',{sbedata.variables('salinity').data,sbedata.variables('pressure'),db.variables('station_longitude'),db.variables('station_latitude')}};
sbe('PSAL_instrument') = ...
...
end

%custom functions

function [rdata] = cphl_recalibrated(data,varargin)
...

Something along the above is declarative, flexible, and very explicit about the mappings and all the conversions done. I could foresee a lot of code reduced within the SBE area if we prioritize this kind of cleanup.

petejan commented 4 years ago

@ocehugo The sea bird data processing tools basically end with a .cnv file, the toolbox sea bird parser should be able to take these and turn them into FV00 files, and as suggested, these should be a facsimile of the cnv file. The seabird data processing tools allow users to add/substract parameters from these cnv files as needed. This is where the unified sbe parser should start IMO.

I don't think the parser should also be processing the data from a cnv file, just create a representation of it that can be processed later. CPHL_calibrated should not belong in the SBE parser, esp as CPHL can come from other sources that the sea bird parser, similarly with PSAL calculation.

The advantage of having the mapping (from sea bird name to netCDF variable name) in an external txt file is that it can be mapped later after a toolbox executable is built, like the imosParameter.txt configuration. As an example I have SBE39's deployed as air-temperature, I would prefer to have them as AIRT netCDF variable and not TEMP.

ocehugo commented 4 years ago

I don't think the parser should also be processing the data from a cnv file, just create a representation of it that can be processed later. CPHL_calibrated should not belong in the SBE parser, esp as CPHL can come from other sources that the sea bird parser, similarly with PSAL calculation.

When you said mappings I thought more of processing map (map/reduce) instead of just renames.

Yes, first loading every bit as in the cnv. The parser ends here. Later, in a processing step, we call the mappings to transform stuff to what we want (renames, recalibrated cphl, renames, etc). The user should even be allowed to select their own mappings.

The advantage of having the mapping (from sea bird name to netCDF variable name) in an external txt file is that it can be mapped later after a toolbox executable is built, like the imosParameter.txt configuration. As an example I have SBE39's deployed as air-temperature, I would prefer to have them as AIRT netCDF variable and not TEMP.

Didn't think on the binaries, good point. The solution then appears to be granular in rename and unit/processing mappings.I can see something like:

%source var, output var, processing map function/names
temperature,AIRT % assume empty map function is self/raw data
Fact-CHL,CPHL,CPHL_calibrated
salinity,PSAL,
salinity,PSAL_nobias, remove_psal_bias
salinity,PSAL_teos,salinity_teos
salinity,PSAL_teos_nobias, salinity_teos,remove_psal_bias %pipelinesque is maybe too much
...

Of course, people would need to select/change that interactively and know exactly the mappings and their meaning.

ocehugo commented 4 years ago

@petejan, also it's best if we can discuss further requiremnts in a separate issue so we can keep the discussion on

petejan commented 4 years ago

Yes good to move to new issue,

Don't the processing steps belong in the existing pre-processing infrastructure in the toolbox? This already allows for user select/change options.

ocehugo commented 4 years ago

Yes good to move to new issue,

Don't the processing steps belong in the existing pre-processing infrastructure in the toolbox? This already allows for user select/change options.

Yes, I call just processing because that is the step that changes the data (apart from the parser unit conversion).

The level of flexibility here is key because it will affect how much code is written and affected (e.g. renaming will probably trigger QC patches). Nonetheless, we will need a better rename/transform/calibrate/pre-process pipeline if we are only to read raw data from the parser anyway. I'm towards a more declarative style when/if doing it.