NeotomaDB / DwC-Mapping

A document explaining how we will map Neotoma against the DarwinCore Schema
MIT License
0 stars 0 forks source link

Mapping Neotoma against the DarwinCore Schema

Cyber4Paleo Development Workshop Logo

This repository tracks the development of efforts to map Neotoma dataset records against the DarwinCore schema to facilitate greater data discovery, reuse and sustainability of records archived within the Neotoma Paleoecological Database. This project is part of the EarthCube Integrative Activities proposal between Neotoma and the Paleobiological Database, and is one step along the path to upload Neotoma records to BISON and GBIF.

Initial work on this project was made possible through collaboration as part of the Cyber4Paleo Community Development Workshop in Boulder, CO, July, 2016. Much of this work is archived as part of the Cyber4Paleo GitHub organization and GitHub pages.

This work is carried out by the Earthlife Consortium, funded by NSF through the EarthCube initiative.

Contributors

We welcome contributions from any individual, whether code, documentation, or issue tracking. All participants are expected to follow the code of conduct for this project.

Description

Mapping the Neotoma Database structure onto DarwinCore standards is relatively complex. While some of the data structure maps easily, the content of the database, and the conceptual structure of the paleoecological records is not consistently equivalent to the semantic structure of the DarwinCore schema. The Rmd has some simple relationships described in the markdown portion of the document, based on a cross-walk started by Michael McClennan, and extended by Jack Williams and Mark Uhen at the Cyber4Paleo Community Development Workshop. Simon Goring developed the Rmd and implemented the actual conversion of the database structure to the csv file output.

How to Use this Repository

The database itself is available as a SQL Server snapshot from the Neotoma Paleoecological Database's website here, or on [figshare.org]() at the Neotoma Database Snapshot project.

With the snapshot loaded into your local server, replace the connection string in functionalized_run.R (around line 27) and the code should "just run", provided you have the required packages. In this case you need libraries RODBC, neotoma, dplyr and tidyr.

Key TODOs

Support

This work is supported through the National Science Foundation's EarthCube Initiative through NSF Award Numbers 1541002 and 1340301.