NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

merge issue on stack eddy when using level = dp01 #112

Closed rfiorella closed 3 years ago

rfiorella commented 3 years ago

Function stackEddy

Describe the bug Returned data frames in output list have a problem with the time variables timeBgn and timeEnd, possibly because it is trying to merge across measurement systems (e.g., the isoH2o and isoCo2 systems). Where it occurs, timeEnd is NA, and the actual observation data is split between rows that have a complete set of (timeBgn, timeEnd) values, and those that are missing timeEnd.

To Reproduce neonUtilities::stackEddy(inpath, level = "dp01", avg = 9)

(For inpath, I had been using a local directory that contains the monthly data files for ONAQ.)

Expected behavior A output list, where the first element of the list contained a list of data frames. The dataframes in this list would have complete sets of (timeBgn, timeEnd) values.

System (please complete the following information):

Additional context Discovered using neonUtilities_1.3.8, but upgrading to 2.0.0 doesn't solve the issue.

For our purposes I can create a workaround by also specifying var, but thought I'd report in any case assuming this is not the intended behavior.

cklunch commented 3 years ago

@rfiorella Thanks Rich! I'll take a look, but this isn't too surprising, getting the join to work correctly in the dp01 data has always been very finicky. I'm planning to send v2.0.1 to CRAN within the next couple of weeks, and I should be able to include this.

cklunch commented 3 years ago

@rfiorella Thanks again for pointing this out. @cflorian1 and I looked into it, and it's a little deeper than I thought. There are two separate issues: 1) If a sensor isn't delivering data, a single empty record per day is reported. I've updated stackEddy() to remove these filler records, but I don't think this is the issue you were focused on. 2) Something is up with the timestamps on the 9-minute isotope data. The CO2 and H2O isotope timestamps frequently have a mismatch of a few seconds. We aren't sure of the cause yet, but it's probably somewhere in the processing code. The internal team is looking into this, but it will probably be a little while before there's a resolution. In the meantime, it doesn't make sense to try to resolve the timestamps on the stacking end, since the cause is somewhere upstream.

So this behavior is going to stay as-is for now, we'll keep you posted. I'll leave this issue open as long as it's unresolved.

rfiorella commented 3 years ago

@cklunch @cflorian1 Thanks Claire and Chris!

(And just as a 'heads up' - we're going to be starting a package submission to CRAN in the next week or two of the code we've been using to calibrate the Atm Iso products. The package imports stackEddy from NEON-utilities, so it might start showing up on reverse import reports. Don't think any changes to stackEddy to fix this issue - if any are even required! - will break anything for either of us though).

cklunch commented 3 years ago

@rfiorella Thanks Rich! If you can hold off until neonUtilities 2.0.1 is live on CRAN, that would probably be best. I wouldn't expect it to have an impact, but you never know. I'm hoping to get the package into the CRAN queue later today.

rfiorella commented 3 years ago

@cklunch No problem! I have a lot of code clean up to do for CRAN before initial submission anyway :)

cklunch commented 3 years ago

@rfiorella neonUtilities 2.0.1 is live on CRAN. Thanks!

rfiorella commented 3 years ago

@cklunch @cflorian1 Hey Claire and Chris - A quick update that our package (NEONiso 0.4.0) is live on CRAN now.

There's an issue using stackEddy on our output files as of 2.0.1, but it's 100% on my end and it only shows up when using stackEddy on files that have gone through the dev branch of NEONiso. Also was able to isolate the change between 2.0.0 and 2.0.1 that causes the issues (we changed the name of a time variable) - should be an easy fix.

cklunch commented 3 years ago

@rfiorella This is great to hear! Would you be willing to submit the package to our Code Hub? https://www.neonscience.org/resources/code-hub We can make it more visible to other NEON users by listing it there.

Guidelines for submissions are here: https://www.neonscience.org/resources/code-hub/code-resources-guidelines and the submission form is here: https://www.neonscience.org/resources/code-hub/code-resources-submission

rfiorella commented 3 years ago

Sure thing! Submitted this morning to tier 1.

On Mar 22, 2021, at 7:28 AM, Claire Lunch @.**@.>> wrote:

@rfiorellahttps://github.com/rfiorella This is great to hear! Would you be willing to submit the package to our Code Hub? https://www.neonscience.org/resources/code-hub We can make it more visible to other NEON users by listing it there.

Guidelines for submissions are here: https://www.neonscience.org/resources/code-hub/code-resources-guidelines and the submission form is here: https://www.neonscience.org/resources/code-hub/code-resources-submission

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/NEONScience/NEON-utilities/issues/112#issuecomment-804060546, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEH3BZVKIW2YOE7RLDSBMCLTE5AWJANCNFSM4XZB6EKA.

rfiorella commented 3 years ago

@cklunch Hi Claire -

Found a potential bug in stackEddy when working on trying to fix NEONiso to work better with neonUtilities >= 2.0.0 - stackEddy errors if var and avg are specified for level = 'dp01' variables that don't have anything under /dp01/ucrt or /dp01/qfqm.

An example: stackEddy(data.path, var = 'dlta18OH2oRefe', avg = 9, level = 'dp01')

returns: in timeSet[[q]] : subscript out of bounds.

I think the issue is if there are no ucrt or qfqm data, timeSet has length 1 @ L. 335 in stackEddy. Let me know if I could provide more info to help debug!

cklunch commented 3 years ago

@rfiorella Thanks Rich, I think your diagnosis is correct! It's a side effect of the changes I made to handle the empty single-day records. I'll try out a fix in the next couple of days.

cklunch commented 3 years ago

@rfiorella Fix pushed to GitHub. It will probably be a couple of months before the next CRAN release, let me know if that will be a problem!

rfiorella commented 3 years ago

Thanks @cklunch! I'll try it out - no problem with the release schedule.

rfiorella commented 3 years ago

Hi @cklunch !

A few more issues have arisen related to the error I posted on 3/26 above that seem to be related to base::merge() somewhere in stackEddy.

The following all produce a "fix.by(by.x, x): 'by' must specify uniquely valid columns" error (MacOS 11.3.1, R 4.0.4, reinstalled at commit 674e018 on branch 2.0.2):

stackEddy(data.path, var = 'dlta18OH2oRefe', avg = 3, level = 'dp01') stackEddy(data.path, var = 'dlta13CCo2Refe', avg = 9, level = 'dp01') stackEddy(data.path, var = 'rtioMoleDryCo2Refe', avg = 9, level = 'dp01')

I have a suitable workaround (do not specify var, and divide variables after stackEddy rather than during) - but wanted to provide notice that this function still might not be working as intended.

cklunch commented 3 years ago

@rfiorella Thanks for the alert! I'll take a look, and see about a fix in the next version. It's probably going to end up being 2.1.0 instead of 2.0.2, there are quite a few updates on deck. I'll report back here about what I find on this problem.

cklunch commented 3 years ago

@rfiorella Well that turned out to be an annoying couple of bugs! Just a couple of oversights in the code in handling the scenarios where there is only one type of data (as opposed to data + qfqm + ucrt etc). I expect it mostly affects the isotope data. In addition to the error you hit above, you may find the earlier code was throwing out a few records unnecessarily, those should be restored now. Sorry about the bugs, thanks so much for catching them!

The fix is on the 2.0.2 branch for now. It's looking good on my end, but if you could give it a try as well that would be fantastic. Thanks!

@cflorian1

cklunch commented 3 years ago

@rfiorella Fix is now on the master branch. Still doing some testing for v2.1.0, expecting to send to CRAN next week.

rfiorella commented 3 years ago

Sorry to be a bit slow returning to this - I'll test it out soon. Thanks!

cklunch commented 3 years ago

neonUtilities 2.1.0 is now on CRAN, including this fix.