cboettig / neonstore

:package: A local content-based storage system for NEON data
https://cboettig.github.io/neonstore
Other
8 stars 5 forks source link

Parse filenames using official docs #3

Closed cboettig closed 4 years ago

cboettig commented 4 years ago

Previous filename parsing was slapdash. This implements a name parser based on the description in https://data.neonscience.org/file-naming-conventions, and thus will successfully parse a much larger fraction of NEON filenames than the current implementation.

NEON includes many files that are not described by that standards document, and for some files the standard has changed either prior or since that doc (thus a given filetype can be found using multiple conventions). Some of these I have deduced from the name patterns, and include both variants, e.g.:

https://github.com/cboettig/neonstore/blob/50f0f43b4233fc4e327f7663d1a78bae86fba4ad/R/neon_filename_parser.R#L98-L105

This will still not parse every file found in NEON. In particular, all files that do not begin with NEON are not parsed. Run the example code in filename-parsing-test.R to see a quick demonstration of parsing on all the NEON files generated at site CPER, including a list of names that do not parse.

This PR does not change which fields are reported by neon_index(), which has a very IS / OS (instrument systems and observation systems csv) focus. In particular, AOP product filenames almost surely contain metadata that is necessary for analyses. Currently, the only way for users to surface that information is to call neon_parse_filenames() directly on the product file list; which returns a table with all recognized metadata fields. neon_read() is focused around the task of stacking the .csv files, and won't be much use most AOP products. At the moment users are left with a lower-level workflow for these, which is perhaps sufficient.

cc @chrlaney