HakaiInstitute / GEM-in-a-box-dataset-repository-template

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Data Integration #5

Closed timvdstap closed 2 months ago

timvdstap commented 2 months ago

Documenting the proposed data integration here for discussion.

From my understanding, there is a survey-template.csv file that will be populated with survey-specific metadata. Among which will be ids for the nutrient and chlorophyll data that is derived through lab analyses. Whether these ids will be populated as a string in a single column (sample_ids) or parsed out into separate columns (e.g., nutrient_sample_id and chl_sample_id) is yet to be decided. For the data collected by the miniDOT and ctd-diver however, this is more continuous data and if I'm not mistaken, each timestamp will not have a unique ID. This data is connected to the survey-metadata and associated lab-derived data by proximity of data collection.

In practical terms what this means is that if sample_time_collection associated with the nturient_sample_id and chl_sample_id overlaps, or is in close proximity to (what does this mean specifically - within 5 minutes?) to the time interval recorded by the ctd-diver or miniDOT, we can reliably say that they are part of the same survey.

Questions that this raises for me is:

timvdstap commented 2 months ago

To answer the questions:

The survey.csv will not have a column for sample_ids included. This would make it more challenging than it needs to be, with lists of sample_ids in a single cell. This is not beneficial for data entry. Instead, Sampling Event Number will also be recorded within derived data sheets, which will help nest sample_ids.