NCEAS / metajam

Bringing data and metadata togetheR
https://nceas.github.io/metajam/
Apache License 2.0
16 stars 7 forks source link

read in EML numHeaderLines and pass into read_d1_files #103

Open atn38 opened 5 years ago

atn38 commented 5 years ago

Hi! Great work on the package. I'm trying it out ahead of this year's EDI Hackathon where we'll develop a tool to auto generate visualizations from data+metadata. This is such a great first step.

I'm testing on this VCR-LTER package which turns out to be a great test case. EML doc specifies 17 columns and read_d1_files returns two because of extra header lines. But EML doc does say how many there are -- 22, so that information should ideally be made use of.

There's the ... option to pass in an argument (from #16). It'd be neat though if the numHeaderLines EML tag can be passed into skip if default function read_csv is used. Alternatively (additionally?) read_d1_files can throw a warning if it doesn't find the number of columns expected in EML docs and suggest probable cause (extra head rows) so users know to pass an extra parameter.

brunj7 commented 5 years ago

Hi @atn3 ,

Thank you for your interest in metajam and for your feedback!

Leveraging numHeaderLines from EML sounds like a good idea, as well as verifying the number of expected columns. We will look into this.

In the meantime, as you mentioned you can manually inspect the file to decide what to skip (note read_csv skip parameter will be 21 on this case):

library(metajam)

# Download
data_path <- download_d1_data("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-vcr.236.10&entityid=27549758f3eeecd9be5344562e0340fb", ".")
#data_path

# Read table in
vcr_fish_count <- read_d1_files(data_path, skip=21)

# Look at the data
vcr_fish_count$data

Always eager to get feedback, so please keep adding new issues and feel free to contribute to metajam!