Smithsonian / CCN-Data-Library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
https://serc.si.edu/coastalcarbon
4 stars 2 forks source link

Synthesis Version Updates #103

Closed jaxinewolfe closed 3 months ago

jaxinewolfe commented 7 months ago

This update is scheduled for March 2024

Please continue working in the develop branch!

Overall Goals:

Additional goals:

jaxinewolfe commented 7 months ago

Resolving errors in existing hook scripts:

The following cores have a depth interval where the min and max are likely reversed:

study_id core_id
Thom_1992 PB1
DelVecchia_et_al_2014 M1314
Costa_et_al_2023 PanamaCaribbean-Station Forest (SF)-1
MacKenzie_et_al_2021 Catanauan_216_714
Sharma_et_al_2021 Koh_Kohng_138_384
Sharma_et_al_2021 Koh_Kohng_138_386

The following studies have intervals with NA depths:

"Nahlik_and_Fennessy_2016" "Langston_et_al_2022" "De_Iongh_et_al_1995" "Agawin_et_al_1996" "Townsend_and_Fonseca_1998" "Holmer_et_al_2007" "Van_Engeland_2010" "Boyd_et_al_2017"

Studies with NA Coords:

Studies with NA Habitat:

Studies with Modeled C:

Synthesis and Post-Processing

jaxinewolfe commented 6 months ago

We need a QAQC function to catch studies that have no associated citation

Some starter code:

no_citations <- ccrcn_synthesis$cores %>% filter(!(study_id %in% unique(ccrcn_synthesis$study_citations$study_id)))

if (nrow(no_citations) > 0) { warning("NOTE: The above studies were removed because they did not have citation information present. Please review the CCN library synthesis to confirm that all synthesis studies have proper study citation information in '/data/CCN_synthesis/CCN_study_citations.csv' ")

unique(no_citations$study_id) }

jaxinewolfe commented 4 months ago

@cheneyr @BettsH

Here are the QAQC results for our current version of the synthesis. It looks like a lot, but I think it's fairly small stuff that we can knock out! For example, theres a bunch of columns starting "..." which resulted from tables being output using write.csv() without specifying that row.names = F (idk why the default is set to true, its annoying) to prevent it from creating a column with the row number index included. If you spot stuff that is related to datasets you've worked on you can go for those quick fixes, or we can chat about some of the more nuanced things in our meeting (or whenever).

Also, we are well past 10k cores, holy moly! 🎉

index test result
1 Core ID uniqueness Check the following core_id(s) in the core-level data: 2B, 305, 398, 399, AL, B1, B2, B3, B4, G15, G4, G5, G9
2 Valid core ID links in core table No core ID in depthseries table: WBWA1109_01PU, WBWA1109_02PU, WBWA1109_03PU, WBWA1109_04PU, NSOR1209_01PU, NSOR1209_02PU, NSOR1209_03PU, NSOR1209_04PU, NBOR1409_01PU, NBOR1409_02PU, NBOR1409_03PU, Catlett_1m, Catlett_Transect, Goodwin_1m, Goodwin_Transect, Pamunkey_Transect, SweetHall_1m, SweetHall_Transect, Taskinas_Transect
3 Valid core ID links in depthseries table No core ID in core table: PB1, RC_U_A, RC_M_A, PR_U_A, PR_M_A, W_U_A, W_M_A, F_U_A, F_M_A
4 Test coordinate uniqueness 1373 sets of coordinates are associated with more than one core. Check 'data/QA/duplicate_coordinates.csv'
5 Validity of column names in depthseries table Undefined columns: ...33, ...38, th234_activity, th234_activity_se, k40_activity, k40_activity_se, ...56, ...57, date, pb210_crs_age, pb210_crs_age_sd
6 Validity of column names in cores table Undefined columns: ...29, salinity, ...31, ecological_condition_flag, ...37, ...38, core_date, core_position_method, geomorphic_id, ...42
7 Validity of column names in sites table Passed
8 Validity of column names in species table Undefined columns: ...7, ...8
9 Validity of column names in impacts table Undefined columns: impact_notes, ...6
10 Validity of column names in methods table Undefined columns: ...30, ...32, ground_or_sieved_flag, ...35, pb210_background_assumption, ...37
11 Validity of column names in study_citations table Undefined columns: keywords, day, ...20, issue, ...22, issn, abstract, eprint, ...30, ...31, article-number
12 Validity of variable names in depthseries table Passed
13 Validity of variable names in cores table Undefined variables: WGS84, riverine, palustrine, deltaic, brackish to fresh, brackish to saline, other, mudflat, plain, submerged subtidal
14 Validity of variable names in sites table Undefined variables: palustrine
15 Validity of variable names in species table Passed
16 Validity of variable names in impacts table Undefined variables: managed, restoring, canalled
17 Validity of variable names in methods table Undefined variables: PVC tube or thin-walled metal tube, Eijkelkamp peat core sampler, shovel, shovel core, gouge corer, polycarbonate tube, duplicate measurements, duplicate measurements, ground and sieved, not specified, not specified, not specified, selected intervals
18 Validity of variable names in study_citations table Undefined variables: primary source, article
jaxinewolfe commented 4 months ago

Synthesis QAQC Checks:

Note: the following have now been added to the synthesis report output

Depthseries

Cores

Bib

jaxinewolfe commented 4 months ago

@cheneyr

Thanks for renaming the citation tables! Some are still getting flagged in the synthesis QA and it looks like it's because the study_id was left out. (Though the citation table for Drake 2024 may still be missing from the derivative folder). So one more annoying edit there for the following:

[1] "Stahl_et_al_2024" "Palinkas_and_Engelhardt_2024" [3] "Palinkas_and_Cornwell_2024" "Drake_et_al_2024"
[5] "Craft_2024"

jaxinewolfe commented 4 months ago

@BettsH Could you take a look at the coordinates for Bukoski et al 2017 in the Sanderman synthesis? They were made fuzzy per the authors request, but there are a few that have ended up in Laos when they should be in Vietnam (so, a bit too fuzzy). Maybe check out the original paper or the supplementary data and/or use google earth engine to see if we can't update those so they a least get assigned the right country.

Cores in question: "M1566" "M1567" "M1568" "M1569" "M1570" "M1571" "M1572" "M1573" "M1574" "M1575" "M1576" "M1577" "M1578"

jaxinewolfe commented 3 months ago

@cheneyr two tasks for you (if you haven't already done them)!

Thank you!