This repository tracks the development of efforts to map Neotoma dataset records against the DarwinCore schema to facilitate greater data discovery, reuse and sustainability of records archived within the Neotoma Paleoecological Database. This project is part of the EarthCube Integrative Activities proposal between Neotoma and the Paleobiological Database, and is one step along the path to upload Neotoma records to BISON and GBIF.
Initial work on this project was made possible through collaboration as part of the Cyber4Paleo Community Development Workshop in Boulder, CO, July, 2016. Much of this work is archived as part of the Cyber4Paleo GitHub organization and GitHub pages.
This work is carried out by the Earthlife Consortium, funded by NSF through the EarthCube initiative.
We welcome contributions from any individual, whether code, documentation, or issue tracking. All participants are expected to follow the code of conduct for this project.
Mapping the Neotoma Database structure onto DarwinCore standards is relatively complex. While some of the data structure maps easily, the content of the database, and the conceptual structure of the paleoecological records is not consistently equivalent to the semantic structure of the DarwinCore schema. The Rmd
has some simple relationships described in the markdown portion of the document, based on a cross-walk started by Michael McClennan, and extended by Jack Williams and Mark Uhen at the Cyber4Paleo Community Development Workshop. Simon Goring developed the Rmd
and implemented the actual conversion of the database structure to the csv
file output.
The database itself is available as a SQL Server snapshot from the Neotoma Paleoecological Database's website here, or on [figshare.org]() at the Neotoma Database Snapshot project.
With the snapshot loaded into your local server, replace the connection string in functionalized_run.R
(around line 27) and the code should "just run", provided you have the required packages. In this case you need libraries RODBC
, neotoma
, dplyr
and tidyr
.
Rmd
so that it is, in some sense, publishable as a data/methods paper. We welcome contribution that would assist in this effort. If you feel like you would be able to contribute significantly enough to be considered an author please contact us first.This work is supported through the National Science Foundation's EarthCube Initiative through NSF Award Numbers 1541002 and 1340301.