Problems: It's hard to collate data when...
a. the date doesn't line up for some variables collected at the same site during the same season (could be water chem vs YSI, water vs sed chem, etc)
(bad solution? = ignore day/time and just merge by season, year, and site)
b. hard to merge data / assign a unique ID when there are repeat measurements (like two d18O values for the same site and date, for whatever reason)
(bad solutions? = average the two values (bad if the reason there are two measurements is that one was not good), or use only one (how to decide which?))
If we don't address those issues, the collated table will not retain as many measurements as possible. One option is to have multiple versions of the collated table, with different degrees of data cleaning / intervention to make things match? Or just keep intervention very minimal and leave further collation up to the user to figure out.
Problems: It's hard to collate data when... a. the date doesn't line up for some variables collected at the same site during the same season (could be water chem vs YSI, water vs sed chem, etc) (bad solution? = ignore day/time and just merge by season, year, and site) b. hard to merge data / assign a unique ID when there are repeat measurements (like two d18O values for the same site and date, for whatever reason) (bad solutions? = average the two values (bad if the reason there are two measurements is that one was not good), or use only one (how to decide which?))
If we don't address those issues, the collated table will not retain as many measurements as possible. One option is to have multiple versions of the collated table, with different degrees of data cleaning / intervention to make things match? Or just keep intervention very minimal and leave further collation up to the user to figure out.