DataONEorg / sem-prov-design

Design documents for the Semantics and Provenance Working Group, DataONE Phase II
Apache License 2.0
5 stars 3 forks source link

Harvest MsTMIP NetCDF data and extract attribute metadata for review #116

Closed csjx closed 9 years ago

csjx commented 9 years ago

For each FGDC metadata file documenting the MsTMIP NetCDF model output data, we have FTP pointers to the data files. Download all NetCDF data to a VM accessible to mn-stage-ucsb-4 and extract the CF-based attribute metadata into either 1) McML documents or 2) a CSV document, keyed off of the PID for the associated science metadata document.

mobb commented 9 years ago

references #122

dlebauer commented 9 years ago

if it helps, there are some scripts for downloading (starting with wget.sh; see README) https://github.com/ebimodeling/model-drivers/blob/master/met/narr/threehourly/wget.sh. The meta-data for the concatenated files can be found, e.g. https://www.betydb.org/inputs.json?id=301 and https://www.betydb.org/inputs.json?id=302

csjx commented 9 years ago

From my email back in April:

I’ve harvested the MsTMIP Model Output data in NetCDF format from the MsTMIP Member Node, and put it on mn-stage-ucsb-4.test.dataone.org (in /var/www/mstmip). The total size is about .65TB, with most files in the 100MB to 6GB range, with two of them being 33GB each.

I’ve generated NcML metadata on each of the .nc4 files, and appended these with .ncml.xml as siblings in the directory of the original file. Likewise, I’ve copied the science metadata as a sibling file, with .fgdc.xml appended to the file name.

All of the files are available over HTTP here:

https://mn-stage-ucsb-4.test.dataone.org/mstmip/

The directory structure is a copy of the structure found on the FTP site.