Open observingClouds opened 1 year ago
@RobertPincus any ideas how to go about this? Is the data also stored somewhere else, where we would have easier access?
I sent an email to podaac and hope they have an idea on how to solve this issue.
This worries me the most:
An “Earthdata Login” is required to access data files from within OPeNDAP-in-the-cloud. This service provided by the EOSDIS program is openly available to all free of charge except where governed by internal agreements. If you access OPeNDAP without being logged in, your Earthdata username and password will be requested.
@observingClouds Thanks for digging into this. I've accessed data using an Earthdata login in scripts in other projects. This relies on having an environmental variable set with the token from Earthdata.
Could we create a CI account for this repo/organization, generate a token, and use it as a Github secret?
Technically it is probably possible, but every user of the catalog would need to create a token as well. If several services request those tokens, it adds a huge burden on everyone and makes the catalog usage much less convenient. For non-interactive usage of the catalog for example, one would need to know beforehand, which tokens need to be created. In this particular case, the tokens also seems to be valid only for one hour, so within one workflow you might need to request a new token.
@observingClouds Of course it's nicest if the data doesn't require authorization or credentials.
Users only need to provide credentials if they're going to use the data, of course. Having to refresh the credentials hourly will be a pain - that's an especially unfortunate choice at JPL.
Hi folks, this is Jack McNelis from the PO.DAAC (jmcnelis@jpl.nasa.gov). I want to help you find a workable solution for maintaining this interface now that our datasets are hosted in the cloud.
I'm not familiar with your software; so it's hard to know what to recommend. An approach like the one mentioned by @RobertPincus should work. There's good documentation describing how to set up Earthdata Login authentication at this link: https://docs.opendap.org/index.php/DAP_Clients_-_Authentication
@jjmcnelis Thanks for being in touch. This repo contains an intake catalog - a map to remotely-accessible resources that abstracts away the particular accesses details for Python users.
A couple of questions about getting Earthdata tokens:
1. Does the PO.DAAC and/or Earthdata have the concept of organizational, rather than personal, accounts?
Yes, you're permitted to register an Earthdata Login account for an organization and/or service.
2. Is it possible to refresh the authorization token programmatically, or does one have to go through a GUI?
Indeed, check out: https://urs.earthdata.nasa.gov/documentation/for_users/user_token#api I'm happy to share some python code if you'd rather not bother implementing it yourself.
@jjmcnelis We access the PO.DAAC regularly (at least once a week) to ensure we are still pointing to valid data. If you have Python examples of how to request a token, use it to access the data, and revoke it (so we don't ask for too many at once) in a single script that would fit our use case perfectly.
Thanks, @RobertPincus. Will you please share an example endpoint you're hitting to do this? Is it CMR or OPeNDAP? That'll help me identify the most appropriate resource for this use case.
@jjmcnelis Here's an example: https://www.ncei.noaa.gov/thredds-ocean/dodsC/psl/atomic/p3
Used in this leaf of the catalog: https://github.com/eurec4a/eurec4a-intake/blob/master/P3/axbts.yaml
This is an OpenDAP endpoint; I think most of our data is hosted behind one OpenDAP server or another.
@jjmcnelis, thank you for your help! I really wish the access of the data would be more straight forward, something that we try to accomplish with this catalog. I hope PO.DAAC will change this again, because it was very easy beforehand.
While I would like to have the original source integrated in the catalog, I think in the short-term the easiest is to just find a different resource or host the data elsewhere, where access is not restricted. I found some of the files at https://github.com/cgentemann/paper_software/tree/master/2020_ATOMIC_Salinity/data . We can easily access those files through intake. Unfortunately, these are not all Saildrone files though.
@cgentemann, do you know an alternative source by any chance? I also just want to raise your awareness that the data access to this particular set of data got more restrictive in the Year of Open Science. Maybe this is something NASA TOPS could address?
@observingClouds I'm sorry, but I don't know of an alternative source. Saildrone data are scattered around in part because of who funded what data and what licensing agreements were applied. The NASA funded data is open and freely available, but open doesn't always mean easy to access and this is a challenge for all datasets, not just Saildrone. Thanks for your comments, I will pass them along.
There is another issue with the new OPeNDAP server that makes it currently not straight forward to use with pydap and intake: https://github.com/pydap/pydap/issues/188
import os
from pydap.client import open_url
from pydap.cas.urs import setup_session
url = "https://opendap.earthdata.nasa.gov/collections/C2491772162-POCLOUD/granules/saildrone-gen_5-atomic_eurec4a_2020-sd1026-20200117T000000-20200302T235959-5_minutes-v1.1595997001389"
setup_session(os.environ['DAP_USER'], os.environ['DAP_PASSWORD'], check_url=url)
fails with
UserWarning: Navigate to https://opendap.earthdata.nasa.gov/collections/C2491772162-POCLOUD/granules/saildrone-gen_5-atomic_eurec4a_2020-sd1026-20200117T000000-20200302T235959-5_minutes-v1.1595997001389, login and follow instructions. It is likely that you have to perform some one-time registration steps before acessing this data.
Something that does work but requires additional code and is not performant, because the entire dataset has to be downloaded, is:
import netrc, fsspec, aiohttp
import intake
from intake.catalog.local import LocalCatalogEntry
(username, account, password) = netrc.netrc().authenticators("urs.earthdata.nasa.gov")
fsspec.config.conf['https'] = dict(client_kwargs={'auth': aiohttp.BasicAuth(username, password)})
d={"SD-1060":LocalCatalogEntry('5min','',args={'urlpath':'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/SAILDRONE_ATOMIC/saildrone-gen_5-atomic_eurec4a_2020-sd1060-20200117T000000-20200302T235959-5_minutes-v1.1595997115384.nc'}, driver='netcdf')}
cat['SD-1060'].to_dask()
The JPL OPeNDAP service has been retired, which has provided e.g. the saildrone datasets. Following the instructions on how to shift to the new system, I fear that the access is now restricted by username and password, which would be a bummer.
Here is for example the new link to the SD-1060 dataset: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/SAILDRONE_ATOMIC/saildrone-gen_5-atomic_eurec4a_2020-sd1060-20200117T000000-20200302T235959-5_minutes-v1.1595997115384.nc (can be found here).
However, I can only open it after entering credentials.