NASA-Openscapes / earthdata-cloud-cookbook

A tutorial book of workflows for research using NASA EarthData in the Cloud created by the NASA-Openscapes team
https://nasa-openscapes.github.io/earthdata-cloud-cookbook
Other
85 stars 30 forks source link

Reticulate discussion Jan 17 #158

Closed jules32 closed 1 year ago

jules32 commented 1 year ago

Current status:

Separate R and Python code chunks that people can copy-paste into RMarkdown/Quarto.

image

Next steps

Interest in having one single R code chunk, using reticulate to run the python code so that people can put this in R scripts as well as RMarkdown (and just have one chunk to copy-paste)

jules32 commented 1 year ago

Cataloguing my process a bit as I go

I've updated the example for how-tos/find-data/programmatic.qmd to include the example with POCLOUD from Luis' AGU poster. This works nicely in our staging hub RStudio instance!

image

However, when I adapt this for R, I get an error from earthaccess search.py#L280

image

The error appears to be with the c() list notation that I added for R (errored otherwise); when I omit temporal variable and just run granules <- earthaccess$search_data(concept_id = "C2036880672-POCLOUD") I do get return values (doesn't error). I don't know enough about python or earthaccess or reticulate to troubleshoot further so

betolink commented 1 year ago

What is the function of that c? earthaccess transforms date strings into python dates, I'll take a look into reticulate to see if this is just a notation issue.

jules32 commented 1 year ago

Hi @betolink, c combines values into a vector or list. In R we need it if we're passing two values to the same variable https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/c

betolink commented 1 year ago

I see, what about trying list instead? according to https://rstudio.github.io/reticulate/ that's for tuples which is the case with the temporal parameters. Or you can run the entire thing in Python and just reuse the results (granules).

jules32 commented 1 year ago

Ok I will try list!

Or you can run the entire thing in Python and just reuse the results (granules).

I'd like to use reticulate to run Python so that people can run an R script without switching to python (some feedback from the Mentors to develop it this way first rather than switching between R and python code chunks in a notebook)

betolink commented 1 year ago

Sorry, by the entire thing I meant just getting to the granules. With earthaccess we can get the links to the data by querying CMR and we can also get the S3 credentials based on the DAAC so combining these 2 operations we'll have a list of files and the keys to access them. Then we can use them in R, although last time we tried to do that we ran into some issues with the S3 client. And thanks to @yuvipanda now we know that for most cases we just need a CMR token and I bet most EO/Remote Sensing packages in R can handle HTTP headers! so we could just:

  1. use earthaccess to build the query and retrieve our URLs (say NetCDF or HDF5 files)
  2. use earthaccess to get a fresh user token (earthaccess can do it but I don't think it's exposed yet to the user)
  3. Use any R packages that accepts a list of URLs and we can add the bearer token to the headers.
jules32 commented 1 year ago

Very cool! I'll dig into this more tomorrow morning, and excited to work with @BriannaLind @mjami00 Brianna Pagan and other R users here!

jules32 commented 1 year ago

Some more testing: with list we get the same error:

image
jules32 commented 1 year ago

Notes following our meeting with Tomasz Kalinowski and Luis Lopez -

Julie pushed this (commit) with the reticulte::tuple() syntax, which can be replaced following a class update by Luis to https://github.com/nsidc/earthaccess/blob/3998ea2f63142aa130acfdabbc16f41c7b83fec9/earthaccess/search.py#L134.

Julie will review the vignettes about reticulate, particularly https://rstudio.github.io/reticulate/articles/python_primer.html Julie will also look into R options for https://docs.xarray.dev/en/stable/ (rspatial)