NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

Mutiple referenceElevations listed for the same sensor ID (HOR.VER) at the same time #92

Closed vlahm closed 4 years ago

vlahm commented 4 years ago

Function LoadByProduct

Describe the bug Depending on the sitemonth requested, different (conflicting) sensor position data is returned.

To Reproduce


    #retrieve ARIK surface water elevation data for 2020-02
    spos_2020_02 = neonUtilities::loadByProduct('DP1.20016.001',
        site='ARIK', startdate='2020-02', enddate='2020-02',
        package='basic', check.size=FALSE)

    #elevation of 1176.4 listed for sensor 102.100 on 2010-01-01 00:00:00.0
    spos_2020_02$sensor_positions_20016 %>%
        filter(HOR.VER == '102.100') %>%
        select(HOR.VER, start, referenceStart, referenceElevation)

    spos_2020_03 = neonUtilities::loadByProduct('DP1.20016.001',
        site='ARIK', startdate='2020-03', enddate='2020-03',
        package='basic', check.size=FALSE)

    #elevation of 1178.79 listed for sensor 102.100 on 2010-01-01 00:00:00.0
    #also, possibly no documentation for the difference between start/end and referenceStart/referenceEnd?
    spos_2020_03$sensor_positions_20016 %>%
        filter(HOR.VER == '102.100') %>%
        select(HOR.VER, start, referenceStart, referenceElevation)

**System (please complete the following information):**
 - OS: Ubuntu
 - OS Version: 18.04
 - R Version: 3.6.3
cklunch commented 4 years ago

@vlahm This can sometimes happen in cases where we've discovered location data were in error, and we've had to update the database. It's different from the situation where a sensor actually moved, those changes are tracked in the start and end dates. In this case the locations were corrected in between processing of the 2020-02 and 2020-03 data.

loadByProduct() does try to handle this, by retaining only the sensor_positions file with the most recent publication date. But if you download multiple months of data separately, as you did in the code you showed, you can still run into it. You can do the same check manually, using the publicationDate column in the sensor_positions file, and rely on the file with the most recent date.

We're prepping for a complete re-processing of all instrumented data later this year, which will clean up these discrepancies. Our infrastructure doesn't currently allow for on-the-fly re-processing and re-publication when metadata are updated, but that's something we're also looking into for the future.

vlahm commented 4 years ago

This makes sense. Thank you!