MicroB3-IS / osd-analysis

Repository for all Ocean Sampling Day related source code with information on how-to acquire OSD data
Apache License 2.0
13 stars 7 forks source link

Update R script for contextual data import and cross-check with MC's data #23

Closed pbuttigieg closed 9 years ago

pbuttigieg commented 9 years ago

Update this script currently in revision 177934453898c065064ffb2b6eb13ef52a48889b to process the updated contextual data (2015-09-01) described here.

Check that the categorical data is homogeneous with MC's data set.

pbuttigieg commented 9 years ago

R script is updated in 653d439d627e76059537a8bb03cb57ef28dc93da

pbuttigieg commented 9 years ago

fixed typo that prevented source() working in 828bf8fcd89a3db498a398d52c554aa94d96c55e

pbuttigieg commented 9 years ago

This table will be updated as the files are compared...

Feature MC data OSD data Comments
sample count 150 203 147 have overlapping sample names, 55 differences in names
latitude and longitude one entry per distance start and stop entries per distance seems the MC data uses OSD's "start_[lat,lon]"
latitude 92 entries in the set difference between OSD and MC data. 53 attributable to missing samples
longitude 82 entries in the set difference between OSD and MC data. 53 attributable to missing samples. Assuming OSD's start_lon is used by MC.
depth seems to be sea floor depth water depth approach of calculating sea floor depth should be documented
ambiguous fields F2, F3, F6, F8 documented seems some fields were blank in MC's data and some may be artefacts of processing.
pbuttigieg commented 9 years ago

Further notes...

MC's data has a set number of decimal places, which sometimes rounds the submitted latitude and longitudes, e.g.:

OSD99_2014-06-21_1m_NPL022 45.7009 45.70092

MC's "Locality_OSD" column does not align to the "site_name" data in the OSD export. It seems that MC's names do not fit the lat/lon entires:

OSD99_2014-06-21_1m_NPL022 45.7009 13.7100 Hawaii Kakaako These coords are in the Northern Adriatic, near Slovenia

Many of the quantitative data values seem quite far off. The nutrient values are probably derived from grid data.

"site name" "OSD values" "MC values" OSD91_2014-06-21_2m_NPL022 107.93 0.000000 OSD92_2014-06-21_2m_NPL022 105.01 0.000000 OSD93_2014-06-21_2m_NPL022 111.76 0.000000

pbuttigieg commented 9 years ago

In general, there should be another import round and all decisions on, e.g., rounding digits and using just the geographic coordinates that correspond to the start points of sampling should be documented on a wiki page in this repo. Comments on the divergence of sensed / gridded and measured values should be present.

Name conflicts should be resolved and checks for other errors performed.