NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

NEONdataStackR not working with mos data off portal #7

Closed sokole closed 7 years ago

sokole commented 7 years ago

@chrlaney

Here's the error I get:

Error in [.data.frame(d, , c((nc + 1):(nc + 4), 1:nc)) : undefined columns selected

I'm trying to stack mosquito CO2 trapping data from D03

chrlaney commented 7 years ago

@sokole when did you download the data?

sokole commented 7 years ago

about 5 minutes ago. These are new data that were published to the portal. Maybe there's something new and weird going on.

chrlaney commented 7 years ago

Yes, that was my concern too - the data may be formatted in a way I didn't anticipate. I'll take a look at it but might not get to it until tomorrow.

chrlaney commented 7 years ago

Check out problem with mosquito borne pathogen data stacking.

sokole commented 7 years ago

I'm going to try different domains and see if the problem is universal. It seems to unzip most of the files, then hits an error (table with the wrong number of columns?). Might be a problem in the data though. I'm trying to check the data to make sure they published correctly, so maybe they didn't.

chrlaney commented 7 years ago

Thanks!

sokole commented 7 years ago

D01 fails too

chrlaney commented 7 years ago

Code for IS column reordering was interfering with OS data packages. Fixed with ff4bc6e. @sokole please help me test?

sokole commented 7 years ago

sure, I was just going to post that it's not just the mos data... the stacker doesn't work for any data sets for me now... so your explanation makes sense.

I'll re-install and see if it works now

sokole commented 7 years ago

@chrlaney it's working again. Thanks!

I was able to successfully stack the mosquitoes sampled with CO2 trap data. Also tested on the tick data, and that worked too.

scelmendorf commented 7 years ago

Perhaps a separate issue but I reinstalled neon-utilities from github this AM and I cannot stack the mosquito files. mos_archive_pooling stacks ok, get 'Error in full_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : std::bad_alloc' when it stacks mos_expertTaxonomistIDProcessed

scelmendorf commented 7 years ago

Also - same files, I expect unrelated issue, but when the stackByTable function reads in a string variable in a particular file that contains only the string 'F', it auto-interprets at 'FALSE'. This is causing some odd behaviour for the files that contain only female (F) mosquitoes. Try stacking file NEON.D01.HARV.DP1.10043.001.mos_expertTaxonomistIDProcessed.2016-05.basic.20170720T152713Z if you want a reproducible example.

chrlaney commented 7 years ago

Well, I've tried to duplicate your issue @scelmendorf but no luck. Made sure my code copy was up to date, downloaded a new mosquitos dataset (all sites, all months), cleared the R environment, etc. and it appears to stack fine. Could you send me the data package you used if you still have it?

sokole commented 7 years ago

Here are a couple more details of the problem, at least for me...

Hope this helps.

chrlaney commented 7 years ago

@scelmendorf & I narrowed down at least part of the issue to a few expertTaxonomistIDRaw (not expertTaxonomistIDProcessed) files that only have 2 columns - domainID and siteID - in the basic package. The full complement of fields for expertTaxonomistIDRaw exist in the expanded package, so these files stack correctly. The pub workbook needs to be updated, and then the data republished. For future issues I suggest that we attach the exact data package for which an error was thrown, so that it's easier to track down the problem. @sokole the reason that you only get 2 of 5 stacked tables is because of this error - it fully processed the first 2 tables in the list, choked on the third.

Closing this issue - but will open a new one for the case in which 'F' is being misinterpreted as "FALSE" when a column's only data value is 'F'.