NWTlter / NWT_CLM

Workflow scripts for running point simulations of CLM at Niwot Ridge using Tvan forcing data
1 stars 7 forks source link

troubled with download_EDI function #31

Closed wwieder closed 2 years ago

wwieder commented 2 years ago

I'm not able to download saddle precip data with the following code. Have permissions changed?

# Download precip data # From here: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-nwt.416.10 writeLines("Downloading Saddle Precip data from EDI...") saddle_precip_data_fp <- download_EDI(edi_id = saddle_precip_data, dest_dir = paste0(DirDnld, "/precip_data"), getNewData = getNewData)

Error in file(con, "r") : cannot open the connection to 'https://pasta.lternet.edu/package/eml/knb-lter-nwt/416' In addition: Warning message: In file(con, "r") :

wwieder commented 2 years ago

@scelmendorf it looks like @hhollandmoritz borrowed from this code? https://github.com/NWTlter/NWT_CLM/blob/d9300fce8f1941a072b7c7eb7527272ca99626f3/prepare_forcings_for_clm.R#L166
Can you help me or @katyarjay fix this?

hhollandmoritz commented 2 years ago

@wwieder @katyarjay I'm having trouble reproducing this error. I just cloned the repo again and pulled down the latest version of the code. It worked for me without any modifications.

I also ran just the download_EDI() function on its own to see if I could reproduce the error, by changing some of the settings, but I was still successful.

One thing that is different from your code above to mine is that I noticed that the most recent version on EDI is 10 rather than 11. So when I ran it, it downloaded version 416.11 rather than 416.10. But then I tinkered a bit and got it to download 10 rather than 11. So for me it still wasn't a problem.

But it is strange because the way the download_EDI() function is written, it should always download the latest version unless you set getNewData=FALSE, if you do that you should get a message A more recent version of the data (version <packageid>) is available. But since you have specified getNewData = FALSE, the latest version will not be downloaded.

The download_EDI() function should 1) Check to see if the most recent package is downloaded 2) if not, check getNewData, if false, it will use existing downloads and not try to download anything new, 3) if getNewData==TRUE download the latest version (first downloading the EML (getEML()), then reading it and downloading the csv file). So it should never try to download an earlier version of the data in the first place.

Sorry not to be more helpful in my response. But what I can confirm is:

  1. The connection exists and data can be downloaded from it.
  2. Something wonky may be going on with the specification of the version number
  3. 10 is the older version, the comment on https://github.com/NWTlter/NWT_CLM/blob/d9300fce8f1941a072b7c7eb7527272ca99626f3/prepare_forcings_for_clm.R#L666 should probably be updated to 11 avoid confusion
  4. the proper url that should be used for download should have the package version appended on it but the one that generated your error did not have the version in it. https://pasta.lternet.edu/package/eml/knb-lter-nwt/416/11 rather than https://pasta.lternet.edu/package/eml/knb-lter-nwt/416 the error about file downloads you received specified a url without a version number, so that's probably where the code is going wrong. Perhaps check that getCurrentVersion() gives 11 when you call getCurrentVersion("416")
hhollandmoritz commented 2 years ago

It does occur to me that perhaps we want to update download_EDI() at some point to allow one to override the "download only the latest" functionality so that if you wanted to force it to download an older version you could do that too. Ideally another thing to update is to make sure that the error message A more recent version of the data (version knb-lter-nwt.416.11) is available. But since you have specified getNewData = FALSE, the latest version will not be downloaded. is more informative, since if someone specifies getNewData=FALSE and they've never downloaded any data before, the script will error out at this line: https://github.com/NWTlter/NWT_CLM/blob/d9300fce8f1941a072b7c7eb7527272ca99626f3/prepare_forcings_for_clm.R#L688-L689

The error is a bit cryptic: Error in file(file, "rt") : invalid 'description' argument

Maybe an easy fix would be to update the message to users here https://github.com/NWTlter/NWT_CLM/blob/d9300fce8f1941a072b7c7eb7527272ca99626f3/prepare_forcings_for_clm.R#L204-L207 to read A more recent version of the data (version knb-lter-nwt.416.11) is available. But since you have specified getNewData = FALSE, the latest version will not be downloaded. If this is the first time you are running this script and you have never downloaded data, please set getNewData=TRUE and try again otherwise no precipitation data will be downloaded.

wwieder commented 2 years ago

This line from getCurrentVersion doesn't seems to work https://github.com/NWTlter/NWT_CLM/blob/d9300fce8f1941a072b7c7eb7527272ca99626f3/prepare_forcings_for_clm.R#L170

When I run it outside the getCurrentVersion function the error states URL 'https://pasta.lternet.edu/package/eml/knb-lter-nwt/416': status was 'Peer certificate cannot be authenticated with given CA certificates'

wwieder commented 2 years ago

Some quick google searching looks like this seems to have to do with my curl certificates, but I'm not really sure how to modify correctly?

hhollandmoritz commented 2 years ago

Here's a potential solution - but may have some drawbacks:

https://stackoverflow.com/questions/40397932/r-peer-certificate-cannot-be-authenticated-with-given-ca-certificates-windows

I'm not sure if this is something that could be solved with a different version of curl. But maybe it's worthwhile to try to set up a conda environment (or Docker??) to see if that might solve the problem more generally? - I don't know enough about how curl works to say for sure, but seems like if a conda environment works, it might solve the problem more generally. The only challenge is that Rstudio in conda is pretty bad at the moment, so we'd need to not only create a custom .yml file for the conda environment but also add information into the readme about how to point Rstudio to the correct version of R specified by the conda environment.

hhollandmoritz commented 2 years ago

Okay^ last comment was silly. It appears you've already got a conda environment set up, since I last paid close attention to the code. I can try to run the code from within the conda environment and see if I get the same problem you're having @wwieder. Are both you and @katyarjay running this from inside the environment?

wwieder commented 2 years ago

I haven't activated any conda environments when running this code, but I am more active with conda and python than I used to be (and it's not from a position of knowledge or understanding). I didn't know Rstudio cared about conda at all... @katyajay, are you able to download data from EDI without any issues? If so, it suggests it is a feature I've created for myself locally.

hhollandmoritz commented 2 years ago

Well then it's probably not a problem caused by conda, but potentially it could be solved by conda. Rstudio generally doesn't care about conda unless you tell it to, so no worries on that front. I'll experiment with a branch that installs r-packages via conda and see if we can make it work and then maybe you can test it Will to see if it solves your problem?

I looked into it for ~15 minutes and it seems like the biggest challenge is that reddyproc isn't in conda's r repo. But it's not too challenging in theory to add a package (I'm in the process of doing this for another project I'm on). So I can see if this is possible.

It seems like since there's already a conda environment in play, we might as well use it to it's highest potential. :)

Sent from Hannah's phone.

On Mon, Nov 8, 2021, 19:59 will wieder @.***> wrote:

I haven't activated any conda environments when running this code, but I am more active with conda and python than I used to be (and it's not from a position of knowledge or understanding). I didn't know Rstudio cared about conda at all... @katyajay, are you able to download data from EDI without any issues? If so, it suggests it is a feature I've created for myself locally.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWTlter/NWT_CLM/issues/31#issuecomment-963715150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MVPE2QZ6RJNUOBTFTVPLULBW6TANCNFSM5HRD2BAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

katyarjay commented 2 years ago

@wwieder Yes, I am able to download data from EDI without issues and was able to download the saddle precip data using this script.

wwieder commented 2 years ago

OK thanks. let me know if you get the Ameriflux data OK too.

On Tue, Nov 9, 2021 at 3:10 PM katyarjay @.***> wrote:

@wwieder https://github.com/wwieder Yes, I am able to download data from EDI without issues and was able to download the saddle precip data using this script.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWTlter/NWT_CLM/issues/31#issuecomment-964591605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5IWJBYYPNPQXU6M5ONTS3ULGL4HANCNFSM5HRD2BAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

wwieder commented 2 years ago

corrected by #32 , but I still don't really understand why.