aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
46 stars 31 forks source link

Small refactor of all cnv reading sections of SBE56/SBE37/SBE19 parsers #648

Closed sspagnol closed 4 years ago

sspagnol commented 4 years ago

CNV reading sections of SBE56/SBE37/SBE19 parsers now use common code.

All cnv files are now read entirely and then split into header/processed/data sections. Has the benefit of reducing SBE56 cnv read times (90sec to 10sec on my machine).

NOTE:

ocehugo commented 4 years ago

Thanks

After skimming I can see the PR include several things: a.logic changes, b.variable inclusion, and c. processing based on cnv context.

a. is probably handled by current tests, but b & c I would've to check - any file examples?

Given the potential large impact and current works, I don't think I will have time to test this until the next release.

sspagnol commented 4 years ago

Don't understand the 'c' comment. Aren't 'a' and 'c' all together?

Attached is a cnv file with the other variables mentioned in convertSBEcnvVar.m

NWS20010.zip

petejan commented 4 years ago

I like my implementation better,

https://github.com/petejan/imos-toolbox/blob/master/Parser/SBEcnvParse.m

along with the file,

https://github.com/petejan/imos-toolbox/blob/master/Parser/seabirdNames.xlsx

reads the calibration constants from a file also and insert them as attributes. Should work for any SeaBird CNV file.

sspagnol commented 4 years ago

Hi Peter,

Trying out your code, still having problems. I added an 't090c' variable entry, but still couldn't get it to run. If you have time can you have a look? Attached is example SBE56 cnv file.

Thanks.

SBE05608836_2020-02-18.zip

ocehugo commented 4 years ago

@sspagnol - Yes a. & c. are similar, but a is just a simple logic changes (refactorings) while c is content changing (new var/cases for example). These are not splitted commits so makes it hard for me to review and require further checks.

@petejan - I appreciate the other version and particularly the xls - helps me to keep updated with requirements and names.

In all honesty, though, I don't see any of the above is the way forward. IMO, the best thing for those files is a parser that mimics the actual file structure. In this case, this is an "xmlparser", since the files are "xmls in disguise" with some "#" annotatons line blocks. I got some early ideas on that, but it's not a priority.

Something that enable this kind of code:

cnvmeta  = loadCNVMetadata(cnvfile, SBEschemas('sbe9plus')) % read the xml and # bits, validate/type convert against schema.

ctree = cnvmeta.xmltree;
extra_opts = cnvmeta.annotations;
instrument_name = ctree.HardwareData.DeviceType; %str
instrument_serial_number = ctree.hardwareData.SerialNumber; %int

%probably a func
n_internal = countnodes(ctree,'InternalSensors')
for s=1:n_internal
     sensor = ctree.HardwareData.InternalSensors(s);     
     sensors(s).id = sensor.attr.id; % str
     sensors(s).type = sensor.type; % str
     sensors(s).SerialNumber = sensor.SerialNumber %int
     sensors(s).Calibration = ctree.CalibrationCoefficients.Calibration(s)
     assert(sensors(s).Calibration.attr.id,sensors(s).id) 
end
...
% load data
cnames,ctypes = datablock_info(ctree);
cnvdata  = loadCNVData(cnvfile,cnames,ctypes)

%process temperature sensors function
cdata = struct();
datanames = fieldnames(cnvdata);
for k=1:length(tind)
    name = sensors(i).id;
    coefs = sensors(i).Calibration;
    cdata(name) = polyval_sbe(coefs,cnvdata(name));
end
% process salinity sensors ...
...

Reasons:

  1. a parser that returns metadata and keeps the file tree structure is the most useful thing to compose the several SBE parsers.
  2. mimicking the structure helps localise the calibration coefficients ownership per instrument without much fuss, without ambiguity or complex case handling,
  3. makes coding a breeze (and without those deeply nested or abundant "if/else/cases").
  4. Intent and readability are much improved. The parser became more simple, composable, and with more explicit logic assignments.
sspagnol commented 4 years ago

Absolutely agree that is a great way forward.

In the meantime if your the person loading up 10 SBE56 set to sample at 10sec for 6 months, then waiting 10 minutes starts to get tiring.