Closed stephstewart02 closed 4 months ago
Hiya,
sorry to hear you are experiencing this issue. I have no idea what is happening there, I am afraid. Could you post the console output of the donwload_ERA function up until the creation of the many temp files starts?
Cheers, E
Hi Erik, I'm pretty sure all the temp files are creating when the aggregating starts. These are the lines I see in the console while all the temp files are being saved out:
`download_ERA() is starting. Depending on your specifications, this can take a significant time. User 295984 for cds service added successfully in keychain Staging 1 download(s). 0001_QS_Raw_testv2.nc download queried Requesting data to the cds service with username 295984
staging data transfer at url endpoint or request id: 060c3bae-408d-4143-9e8f-a1ca2ffd68d1
timeout set to 10.0 hours
polling server for a data transfer Downloading file |=============================================================================================================| 100%
moved temporary file to -> /home/sf/l1sas04/Data/0001_QS_Raw_testv2.nc
Delete data from queue for url endpoint or request id: https://cds.climate.copernicus.eu/api/v2/tasks/060c3bae-408d-4143-9e8f-a1ca2ffd68d1
Checking for known data issues. Loading downloaded data for masking and aggregation. Aggregating to temporal resolution of choice`
I have tried this both on Windows and on Linux, and have the same issue. I had to cut it off after creating almost a TB of temporary files as I was worried it would use up all the storage space on a shared server. For me the temp files are all saved in "/temp/RtmpSfK59C/raster/".
Alright! I believe I found the culprit. It is the conversion of raster objects to SpatRasters for the terra
output I have implemented some time ago (https://github.com/ErikKusch/KrigR/pull/45). I am already working towards a new deployment of KrigR
which gets rid of this step and expect this release to happen in the next two/three months.
Obviously, this does not solve your issue right now. So, here is what I suggest as a workaround:
download_ERA()
after all raw data has been downloadedR
. In your case that would be: QS_Raw <- stack("/home/sf/l1sas04/Data/0001_QS_Raw_testv2.nc")
This will load the raw data downloaded from the CDS as a raster
stack object. Would this be a viable solution for now?
I am afraid getting rid of the conversion step without bricking other essential functionality would be more hassle than it is worth at this point with a new release that addresses this issue anyways coming up.
Thanks for your speedy detective work! That works fine as a stopgap measure, but is there any additional formatting or manipulating that is done with the KrigR package? I ask because I have loaded in raw ERA5-Land data downloaded from the CDS API via Python using the stack function, but it isn't in a format that is compatible with a function I am trying to run in the other package I alluded to in my first message. The README for that package specifically says that the KrigR function gets the ERA5-Land data into an appropriate format, so I am trying to figure out how to get the raw data into the right format since I can't use the download_ERA function.
The current version of KrigR only ensures that time components are saved properly to the time-slot in netCDF files. That being said, you aren't doing any temporal aggregation so this should not affect the files KrigR would produce if we didn't need to have the stopgap in place.
However, I think we can actually avoid the stopgap altogether now with the latest version of KrigR
on the development branch. I just finished a first re-deploy of the download and temporal aggregation functionality there. Kriging is disabled on the development branch, but it sounds like you won't need it. You can install the development version like so:
devtools::install_github("https://github.com/ErikKusch/KrigR", ref = "Development")
Note that the new download function is called CDownloadS()
there.
On to the formatting specifics for stagg
- the documentation seems to specify a RasterBrick. KrigR
produces SpatRasters. You can transform SpatRaster objects into RasterBrick objects doing so:
raster::brick(SPATRASTEROBJECT)
Please let me know if this resolves your issue :-)
Given the thumbs up on the proposed solution, I assume this has resolved your issue and I am closing this issue. Feel free to ping me again here to reopen it if you run into issues.
HI @ErikKusch and thanks for your work developing this. You package was recommended for use in tandem with another package to calculate growing degree days using ERA5-Land data. This requires hourly global data. As I understand it, this is the native temporal resolution of the ERA5 data, so I expected that there would be minimal processing necessary, but when I try to run my code, I noticed that hundreds of GBs of data are being saved out in the temp folder in the form of files with .gri and .grd extensions when I pull even 2 days of global data (I stopped the call after many hours and 350 GBs of temporary files). I am fairly novice to working with geospatial data, so any guidance would appreciated. The code I am running is as follows:
QS_Raw <- download_ERA( Variable = "2m_temperature", DataSet = "era5-land", DateStart = "1995-01-02", DateStop = "1995-01-03", TResolution = "hour", TStep = 1, Dir = Dir.Data, FileName = "QS_Raw_testv2", API_User = API_User, API_Key = API_Key )