Smithsonian / CCN-Data-Library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
https://serc.si.edu/coastalcarbon
4 stars 2 forks source link

Hook: Marot et al 2022 #69

Closed jaxinewolfe closed 7 months ago

jaxinewolfe commented 1 year ago
jaxinewolfe commented 1 year ago

Hey @BettsH! I took another look at these files and worked out a way to automate the data table merging for soil properties and the dating info (except radiocarbon, which we should just read in and merge manual from the two excel files). I was also able to merge the information from the FieldLog tables. There also aren't any Field Logs for sites 2016-348-FA (16CCT04) and 2016-358-FA (16CCT07), but maybe this info will need to be found in their XML files. Hopefully the only manual work you'll have to do is fill out the methods table.

The script is in the Marot folder, I didn't output any of the tables that are created because they still need some work and it wouldn't make sense to output a half-baked table just to read it in again. But it's better to clean up a merged table than clean up each one individually. So! Go ahead and run the script and see where the curation can be taken from there. If you worked on this today, you will likely be able to apply any code you developed to the curation of these merged products.

One thing: there are cases where the column names were slightly different from one another and so they created separate columns when I joined the table rows. Since there won't be an overlap in values between these two columns, you can merge them into one column using the function coalesce() within mutate. Ex. ...mutate(new_col = coalesce(col1, col2)) and drop the old incomplete cols. Don't bother merging columns that we won't need for the final tables though.

Happy curating! 🤓

BettsH commented 1 year ago

@jaxinewolfe I resolved the method IDs to reflect each of the coring methods with "shovel corer" as one of the types (even though this is not an option under coring_method in the library data structure Methods table). Another thing to note is that there are several occurrences of cores sharing the same lat/long: this is OK because the data sheets reflect this apparent duplication.

@HolmquistJ I seem to have answered all of the questions that I sent to you in my last update email regarding this hook, so no need to go back to that email.

jaxinewolfe commented 1 year ago

@BettsH That sounds good to me. Yes, the coordinate duplication is not uncommon, especially in cases where people take "replicate" cores. Thanks for all your work on this, it's not an easy hook by any means! Let me know when you've gotten to a point where you'd like me to look over the curation script.

@HolmquistJ shovel cores and surface samples are infrequent, but they do occur so we might consider adopting them into the controlled vocab for coring method. Perhaps in the new year, we can take some time to revisit our approach to uncontrolled vocabulary in the Library (documented here).

HolmquistJ commented 7 months ago

@BettsH There was an issue with this hook. Rose made an edit that caused all cores to be classified as marshes. I reverted back to Henrey's original version.