aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
45 stars 30 forks source link

ADCP refactorings #724

Open ocehugo opened 3 years ago

ocehugo commented 3 years ago

This issue is to track down future refactorings required at most ADCP parsers. It is likely the list below will change as I inspect other parsers in more detail.

...

petejan commented 3 years ago

@ocehugo This is the output from my python parser to extract all the data fields from a RAW RDI ADCP file, `

netcdf \16413000.000 {
dimensions:
    TIME = UNLIMITED ; // (726163 currently)
    CELL = UNLIMITED ; // (28 currently)
variables:
    double TIME(TIME) ;
        TIME:long_name = "time" ;
        TIME:units = "days since 1950-01-01 00:00:00 UTC" ;
        TIME:calendar = "gregorian" ;
        TIME:axis = "T" ;
    float INST_TEMP(TIME) ;
        INST_TEMP:units = "degrees_Celsius" ;
        INST_TEMP:instrument_uncertainty = 0.5 ;
    float PRES(TIME) ;
        PRES:units = "dbar" ;
        PRES:applied_offset = -10.1353 ;
    float PRES_VAR(TIME) ;
        PRES_VAR:units = "dbar" ;
    float HEADING_MAG(TIME) ;
        HEADING_MAG:units = "degrees" ;
    float PITCH(TIME) ;
        PITCH:units = "degrees" ;
    float ROLL(TIME) ;
        ROLL:units = "degrees" ;
    float TX_VOLT(TIME) ;
        TX_VOLT:units = "V" ;
    float TX_CURRENT(TIME) ;
        TX_CURRENT:units = "A" ;
    float SOUND_SPEED(TIME) ;
        SOUND_SPEED:units = "m/s" ;
    float BEAM1_VEL(TIME, CELL) ;
        BEAM1_VEL:units = "m/s" ;
        BEAM1_VEL:valid_max = 20.f ;
        BEAM1_VEL:valid_min = -20.f ;
    float BEAM2_VEL(TIME, CELL) ;
        BEAM2_VEL:units = "m/s" ;
        BEAM2_VEL:valid_max = 20.f ;
        BEAM2_VEL:valid_min = -20.f ;
    float BEAM3_VEL(TIME, CELL) ;
        BEAM3_VEL:units = "m/s" ;
        BEAM3_VEL:valid_max = 20.f ;
        BEAM3_VEL:valid_min = -20.f ;
    float BEAM4_VEL(TIME, CELL) ;
        BEAM4_VEL:units = "m/s" ;
        BEAM4_VEL:valid_max = 20.f ;
        BEAM4_VEL:valid_min = -20.f ;
    ubyte CORR_MAG1(TIME, CELL) ;
    ubyte CORR_MAG2(TIME, CELL) ;
    ubyte CORR_MAG3(TIME, CELL) ;
    ubyte CORR_MAG4(TIME, CELL) ;
    float ECHO_INT1(TIME, CELL) ;
    float ECHO_INT2(TIME, CELL) ;
    float ECHO_INT3(TIME, CELL) ;
    float ECHO_INT4(TIME, CELL) ;
    ubyte PCT_GOOD1(TIME, CELL) ;
    ubyte PCT_GOOD2(TIME, CELL) ;
    ubyte PCT_GOOD3(TIME, CELL) ;
    ubyte PCT_GOOD4(TIME, CELL) ;
    ubyte STATUS1(TIME, CELL) ;
    ubyte STATUS2(TIME, CELL) ;
    ubyte STATUS3(TIME, CELL) ;
    ubyte STATUS4(TIME, CELL) ;

// global attributes:
        :instrument = "RDI ; WorkHorse 150kHz" ;
        :instrument_model = "WorkHorse 150kHz" ;
        :instrument_serial_number = "16413" ;
        :frequency = 153.6 ;
        :data_coordinates = "Beam" ;
        :number_beams = 4 ;
        :number_cells = 28 ;
        :cell_size_cm = 800 ;
        :blank_after_tx_cm = 352 ;
        :dist_centre_bin1_cm = 1224 ;
        :xmit_pulse_length_cm = 849 ;
        :time_between_pings_sec = 72. ;
        :instrument_setup_fixed_cpuVER = 50 ;
        :instrument_setup_fixed_cpuREV = 41 ;
        :instrument_setup_fixed_sysConfig = 16841 ;
        :instrument_setup_fixed_read = 0 ;
        :instrument_setup_fixed_lag_len = 13 ;
        :instrument_setup_fixed_num_beam = 4 ;
        :instrument_setup_fixed_num_cells = 28 ;
        :instrument_setup_fixed_pings_per_ensemble = 1 ;
        :instrument_setup_fixed_cell_length = 800 ;
        :instrument_setup_fixed_blank_after_tx = 352 ;
        :instrument_setup_fixed_profile_mode = 1 ;
        :instrument_setup_fixed_low_corr = 0 ;
        :instrument_setup_fixed_code_reps = 9 ;
        :instrument_setup_fixed_gd_min = 0 ;
        :instrument_setup_fixed_error_vel_max = 0 ;
        :instrument_setup_fixed_tpp_min = 1 ;
        :instrument_setup_fixed_tpp_sec = 12 ;
        :instrument_setup_fixed_tpp_hun_sec = 0 ;
        :instrument_setup_fixed_coord_trans = 7 ;
        :instrument_setup_fixed_head_alignment = 0 ;
        :instrument_setup_fixed_head_bias = 0 ;
        :instrument_setup_fixed_sensor_source = 125 ;
        :instrument_setup_fixed_sensor_available = 61 ;
        :instrument_setup_fixed_bin1_dist = 1224 ;
        :instrument_setup_fixed_xmit_pulse_len = 849 ;
        :instrument_setup_fixed_ref_layer = 1281 ;
        :instrument_setup_fixed_false_target_thresh = 255 ;
        :instrument_setup_fixed_spare = 0 ;
        :instrument_setup_fixed_tx_lag_dist = 96 ;
        :instrument_setup_fixed_system_bandwidth = 1 ;
        :instrument_setup_fixed_system_power = 255 ;
        :instrument_setup_fixed_spare2 = 0 ;
        :instrument_setup_fixed_inst_serial = 16413 ;
        :instrument_setup_fixed_beam_angle = 20 ;
        :time_coverage_start = "2018-10-15T00:00:00Z" ;
        :time_coverage_end = "2020-06-11T03:14:24Z" ;
        :date_created = "2020-06-14T09:08:10Z" ;
        :history = "2020-06-14 created from file 16413000.000" ;
}

How many of these fields does one need, maybe some, maybe not, but why not extract them all if the FV00 file is supposed to represent the RAW input file.

One thing that was missing from the current parse was the ADCP frequency, model, firmware version, and parameters around the cell binning into ensemble information.

The ugly python parsing code is here rdi2netcdf.py

ocehugo commented 3 years ago

@petejan,

FYI, I just did a lot of refactoring, improved the performance/memory usage and extended a bit the information we read from RDIs - mostly the information required for coord transformation: see https://github.com/aodn/imos-toolbox/pull/725 and subsequent PRs that are happening at the moment.

may you please send me a comprehensive list of attributes we are not importing? I think this is good timing to solve this issue since I'm quite fresh on the details of our binary parser.

I didn't code anything to export new information to files yet, so I would appreciate a list of the missing raw attributes so I can quickly iterate through them. I'm quite sure that we are almost there in terms of parsing all the RDI information. My idea is to name them as close as possible to the instrument manuals but that indeed requires auxiliary documentation explaining each - which I imagine you may be the right person to do it!

petejan commented 3 years ago

@ocehugo

What attributes does the parser extract and put into the netCDF file? the ones around ensemble information used to be missing, not sure if this has been fixed yet.

What is the imos-toolbox approach to recording firmware versions of instruments?

This is the manual explaining what each field is, page 127 has the fixed header explanation,

WorkHorse Commands and Output Data Format_Jun18.pdf

ocehugo commented 3 years ago

@petejan,

What attributes does the parser extract and put into the netCDF file? the ones around ensemble information used to be missing, not sure if this has been fixed yet.

If you could drill down and come with a list of what we don't import from the binary parser that your python code does, it would be great. Again, we load a lot of parameters, but we don't export all of them. What goes in the netcdf is easy to say - we export everything described in the imos/cf conventions. Anything beyond that is basically not exported.

hence, as of today, most of the global attributes equivalent that you got above are not exported, but a lot of them are imported, some of them in more detail. The reason why this is the case is that we don't have a convention for exporting them. Hence, one avenue is that you could propose a convention for say, FV00 RDI adcp metadata exports (attribute names, attribute values, their type, their description if any, etc).

For example, your instrument_setup_fixed_sensor_source, is actually a hashtable/dictionary of several configurations. Every bit in that number means different things. Hence, we could create a specific attribute name for each bit. If we don't go on to describe each bit, we at least need to provide a context for interpretation of that number (which could be a wiki page for example).

What is the imos-toolbox approach to recording firmware versions of instruments?

ATM, there is no storage of firmware versions in the netcdf. For RDIs we load it but it's only stored in the meta struct and used for some dated time logic.

May you please create an actionable/specific issue with your thoughts!?

petejan commented 3 years ago

@ocehugo so the action/specific issue is that all the data from the RAW file should in replicated in the FV00 file, so every value from the fixed header should be represented in a netCDF attribute, the above list is a good place to start.

One could extent this to if the raw file was in beam coordinates, then the FV00 file should be in beam coordinates, then converted to distance/depth when put into the FV01 file.

The imos/cf conventions don't really say anything about the specific attributes apart from use what is needed to describe the data.

As for binary fields, I have converted these into a string value and used a character attribute in the past, like the data_coordinates attribute above.