NEONScience / NEON-utilities

Utilities and scripts for working with NEON data. Currently: an R package with functions to join (stack) the month-by-site files in downloaded NEON data, to convert data to geoCSV format, and to download data from the API.
GNU Affero General Public License v3.0
57 stars 36 forks source link

Delete stackByTable's temp folder after stacking is complete #91

Closed kevstyers closed 4 years ago

kevstyers commented 4 years ago

Is your feature request related to a problem? Please describe. An issue I've run into downloading data from the neonUtilities::loadByProduct() function is that the temp folder (Linux/Windows) fills up and causes disk space issues. It did take me some time to identify why this was happening but eventually I figured out that when I download data from NEON in R, the zipped folders were still lingering somewhere on the drive. I then looked at the neonUtilites package to see where the files were being stored and found over 10gb of zipped files in my AppData/Local/temp/ folder. Albeit I have been downloading all post-2018 TIS met data, so this is likely more the data than the average user and is somewhat an edge case.

Describe the solution you'd like I'm not too familiar with writing packages in R. But I'm thinking this should be a somewhat technically simple fix. Once neonUtilities finishes stacking the files, it then deletes the temp folder it made.

Describe alternatives you've considered I can envision some users will want to keep those zipped files so that way they do not have to re-download data they already have, so perhaps this functionality could be a variable in the function? ie .keep_temp_zips = FALSE

I could keep periodically deleting these files manually. Or write a function as a part of my download that automatically deletes the R temp folders.

Additional context This was the amount of data saved in the tempdir() downloading DP1.00001 from 2018 - 2020 for 22 sites. image

cklunch commented 4 years ago

@kevstyers Thanks, I'll look into this. Deleting the temporary files is the intended behavior, and loadByProduct() is set up to do that, but you're the second user I've heard from recently that it doesn't seem to be working correctly anymore. It may be a Windows-related issue, and/or may be due to changes in the most recent versions of R. It should be an easy fix, I'll post an update soon.

cklunch commented 4 years ago

@kevstyers This is now fixed on GitHub. I'll be sending the new version to CRAN within the next week, and I'll close this issue when the new version is live. Thanks!

cklunch commented 4 years ago

@kevstyers Apologies, I forgot to come back to close this out. The fix is on CRAN. However, it turned out this bug had been blocking a different bug, which is now in play. If you download too many site-months (>500-600ish), the code that deletes the temporary files chokes. This is the problem referenced in #94

It's avoidable by downloading smaller batches of data, or you can install the version currently on GitHub, it's fixed here. I'm working toward a new CRAN release, but it will be a few weeks.