NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

possible bug in getDatatable(), full table not downloading #106

Closed srweintraub closed 3 years ago

srweintraub commented 3 years ago

Function getDatatable()

Describe the bug A user reports that this function is returning inconsistent numbers of records for the sls_soilChemistry table, even when she runs it back to back. I was not able to reproduce this, but Eric was. The user confirms that she has updated to the latest version of neonUtilities.

To Reproduce sls_soilChem <- getDatatable(dpid = "DP1.10086.001", data_table_name = 'sls_soilChemistry')

Expected behavior All records in the table to download, not a variable count.

System (please complete the following information): I did not get this information from the user, but since Eric can confirm the problem, it is perhaps not dependent on that?

srweintraub commented 3 years ago

@sokole looks like you may have to assign this one to yourself. let me know what you find so I can circle back with this end user.

cklunch commented 3 years ago

@srweintraub @sokole Thanks for the heads up, this is almost certainly an API rate limit problem. getDatatable() isn't in heavy use, if she's definitely on version 1.3.8 then apparently the rate limit fix isn't fully working there.

For now, using loadByProduct() instead will solve the problem. I'm debating how to handle access to a single table at a time in version 2.0 - I worry about users downloading data without the proper context, but I do understand the utility.

srweintraub commented 3 years ago

@cklunch Right, I did mention to her that loadByProduct() should not have the same issues, so that's fine for now. This is an unanticipated (to me at least) downside of the bundling. Let's say you just want to know which soil samples have C/N data, because you want to go pull the microbe data for those samples (like this end user). Having to download everything that's part of 10086 is a pain, it would be much faster to simply grab that one C/N table. With litter it will be even more annoying, since there is oh so much field data where we don't have chem. I can see users wanting to download chem data first, to maybe then decide which sites/years they actually want the field data for.

I wish I had thought of these things earlier. I still think in general, bundling makes things better, but there are definitely some disadvantages too. I thought a function that allows single table pulls would help and was glad to learn about getDatatable, but if you don't plan to support that function going forward...

Not that this is very important, but I can also see single-table pulling being useful for QC, but that's more of an internal thing.

cklunch commented 3 years ago

@srweintraub The demand clearly exists, it just makes me nervous. Since we have a major package revision looming anyway, I'm thinking about better ways to constrain it and/or loop it in to the other package functions. I'll keep you posted.

cklunch commented 3 years ago

neonUtilities 2.0.0 is now on CRAN, and this issue is resolved - getDatatable() is deprecated, and the functionality to download by table is available, with a warning, in loadByProduct().