Smithsonian / CCN-Data-Library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
https://serc.si.edu/coastalcarbon
4 stars 2 forks source link

Version 1.1.0 Updates #86

Closed jaxinewolfe closed 7 months ago

jaxinewolfe commented 10 months ago

Issue to document and resolve database updates for the release of Version 1.1.0

jaxinewolfe commented 8 months ago
Synthesis QAQC Results 11-10-2023 n test result
1 Core ID uniqueness Check the following core_id(s) in the core-level data: 2B, 305, 398, 399, DJI_Djirnda_1, DJI_Djirnda_4, DJI_Djirnda_5, FURO_Furo Grande_1, FURO_Furo Grande_2, FURO_Furo Grande_3, FURO_Furo Grande_4, FURO_Furo Grande_5, FURO_Furo Grande_6
2 Valid core ID links in core table No core ID in depthseries table: WBWA1109_01PU, WBWA1109_02PU, WBWA1109_03PU, WBWA1109_04PU, NSOR1209_01PU, NSOR1209_02PU, NSOR1209_03PU, NSOR1209_04PU, NBOR1409_01PU, NBOR1409_02PU, NBOR1409_03PU, Copertino_unpublished_1
3 Valid core ID links in depthseries table No core ID in core table: PB1
4 Test coordinate uniqueness 1038 sets of coordinates are associated with more than one core. Check 'data/QA/duplicate_coordinates.csv'
5 Validity of column names in depthseries table Undefined columns: date, pb210_crs_age, pb210_crs_age_sd
6 Validity of column names in cores table Undefined columns: geomorphic_id
7 Validity of column names in sites table Passed
8 Validity of column names in species table Passed
9 Validity of column names in impacts table Passed
10 Validity of column names in methods table Undefined columns: ground_or_sieved_flag, pb210_background_assumption
11 Validity of column names in study_citations table Undefined columns: keywords, day, issn, abstract, eprint, article-number
12 Validity of variable names in depthseries table Undefined variables: dredge horizon
13 Validity of variable names in cores table Undefined variables: subplot, plot, WGS84, riverine, palustrine, deltaic, brackish to fresh, bracish to saline, other, unvegetated, mudflat, scrub/shrub, peatland, hummock, hollow, river's edge, plain, submerged subtidal
14 Validity of variable names in sites table Undefined variables: palustrine, unvegetated
15 Validity of variable names in species table Passed
16 Validity of variable names in impacts table Undefined variables: degraded, storm or wind, disturbed, restoring, managed, canalled, Intact, Degraded, Plantation, Restoration
17 Validity of variable names in methods table Undefined variables: PVC tube or thin-walled metal tube, shovel, shovel core, polycarbonate tube, duplicate measurements, duplicate measurements, ground and sieved, not specified, total carbon difference after LOI, not specified, not specified, selected intervals
18 Validity of variable names in study_citations table Undefined variables: primary source
jaxinewolfe commented 8 months ago

Hello @HolmquistJ @cheneyr @BettsH ,

I am creating this issue so we can all be on the same page about the progress we make towards this next database update. I posted the results of the most recent synthesis test run above (running at 8100+ cores now!). There are certainly some things to resolve, but I think we're well on our way to a great update.

To speak to a few of these:

  1. The combination of study, site, and core ID will make these unique, but we can track these down if we need to and make them more unique (low priority I'd say)
  2. Studies with core IDs missing from depthseries table: Turck_2014, Peck_et_al_2020, Marot_et_al_2020, and Copertino_unpublished (I think we just need to remove this last one)
  3. Thom_1992 has one core documented which is not in the depthseries

For undefined attributes and variables, some of these are new, and some have been hanging around. We may think about having a hackathon to work through some of these updates that require more team-based discussion. Edit: uncontrolled variables will be added readily (while avoiding redundancy). New attributes (most of which are related to dating) warrant further discussion.

Other thoughts and goals for the update:

Study-specific observations:

Studies with modeled fraction carbon values: (check off once the values have been removed)

jaxinewolfe commented 8 months ago

@HolmquistJ I noticed that the following cores have something fishy going on with their depths. In every case, the total of all the sampled intervals (0-1, 1-2, 2-3cm etc) ends up being greater than the maximum depth of the core taken. What do you think about this?

Screen Shot 2023-10-30 at 10 26 28 PM
jaxinewolfe commented 8 months ago

Missing core locations from the following studies:

1 Thom_1992 2 Schile-Beers_and_Megonigal_2017 3 Nsombo_et_al_2016 4 Eid_and_Shaltout_2016_Egypt 5 Eid_et_al_2016_Saudi_Arabia 6 Poppe_et_al_2019 7 Marot_et_al_2020 8 Kauffman_et_al_2020 9 Copertino_unpublished 10 Drexler_et_al_2013 11 AntisanaEcologicalReserve_Soil 12 Chingaza_Soil 13 SWAMP Data-Soil carbon-Cauassu Leste Shrimp-2016-Brazil 14 SWAMP Data-Soil carbon-Cauassu Oeste Shrimp-2016-Brazil 15 SWAMP Data-Soil carbon-Cumbe Leste Camaro-2016-Brazil 16 SWAMP Data-Soil carbon-Cumbe norte Camarao-2016-Brazil 17 SWAMP Data-Soil carbon-Cilacap-2011 18 Belshe_et_al_2019

jaxinewolfe commented 8 months ago

The following studies have one or more cores with no assigned habitat (even after the post-processing script is run). Check off if the hook scripts have been updated with habitat assignments (ex. helps if there is an associated publication to dig into).

cheneyr commented 7 months ago

@jaxinewolfe an update on the CIFOR missing core locations -

We now have site or sub-plot level positional data for all sites but 5 17 SWAMP Data-Soil carbon-Cauassu Leste Shrimp-2016-Brazil 18 SWAMP Data-Soil carbon-Cauassu Oeste Shrimp-2016-Brazil 19 SWAMP Data-Soil carbon-Cumbe Leste Camaro-2016-Brazil 20 SWAMP Data-Soil carbon-Cumbe norte Camarao-2016-Brazil 51 SWAMP Data-Soil carbon-Cilacap-2011

jaxinewolfe commented 7 months ago

@cheneyr Thanks for the update!! I ran the synthesis and it cut the number of NA coords in half, which is great 😎 I still see a few more sites needing coords - Was this most recent change carried through to the cifor_alt_cores.csv in the derivative folder? I've updated the comment above to reflect the reduced list of studies.

cheneyr commented 7 months ago

@jaxinewolfe I just pushed an update with the most recent changes and missing cores filled for Vaughn et al 2020 and Van Ardenne et al 2018. On my side we're only missing 37 core locations from 5 sites from the CIFOR_alt script

jaxinewolfe commented 7 months ago

Thanks @cheneyr ! Yay the studies missing coords are down to 18 now. Just two more updates for you (sorry for all the back and forth!):

study_id core_id n 1 SWAMP Data-Soil carbon-Djirnda-2014-Senegal DJI_Djirnda_1 2 2 SWAMP Data-Soil carbon-Djirnda-2014-Senegal DJI_Djirnda_4 2 3 SWAMP Data-Soil carbon-Djirnda-2014-Senegal DJI_Djirnda_5 2 4 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_1 10 5 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_2 5 6 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_3 16 7 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_4 5 8 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_5 10 9 SWAMP Data-Soil carbon-Furo Grande-2017-Brazil FURO_Furo Grande_6 6

jaxinewolfe commented 7 months ago

@BettsH in the SWAMP data, there are some inland "peatland" cores that snuck in for South America (I think they're the mountainous peat lands, so not exactly tidal) - would you mind leaving these out?

BettsH commented 7 months ago

updated!