CS-SI / eodag

Earth Observation Data Access Gateway
https://eodag.readthedocs.io
Apache License 2.0
328 stars 45 forks source link

Added new provider for NASA PODAAC products : Not able to download them #755

Open annesophie-cls opened 1 year ago

annesophie-cls commented 1 year ago

Hi,

I added a new provider for the STAC NASA PODAAC catalog : https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/ But i don't succeed to download products, as I get a 401 Unhautorized Error. However, from my web browser I don't have any problem to download the product from the downloadLink.

Please have a look at the notebook screenshot, that is trying to download this product : https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/search?ids=ascat_20230620_092700_metopb_55801_eps_o_250_3301_ovw.l2 But only the .png data is downloaded, not the .nc data.

notebook_eodag_nasa

And this is the provider configuration :


earthdata_podaac:
  priority: 0
  search:
    type: StacSearch
    results_entry: features
    api_endpoint: https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/search
    need_auth: false
    pagination:
      max_items_per_page: 500
    discover_metadata:
      auto_discovery: true
      metadata_pattern: '^[a-zA-Z0-9_:-]+$'
      search_param: '{{{{"query":{{{{"{metadata}":{{{{"eq":"{{{metadata}}}" }}}} }}}} }}}}'
      metadata_path: '$.properties.*'
    discover_product_types:
        fetch_url: https://cmr.earthdata.nasa.gov/cloudstac/POCLOUD/collections
        result_type: json
        results_entry: 'collections[*]'
        generic_product_type_id: '$.id'
        generic_product_type_parsable_properties:
          productType: '$.id'
        generic_product_type_parsable_metadata:
          abstract: '$.description'
          license: '$.license'
          title: '$.id'
          missionStartDate: '$.extent.temporal.interval[0][0]'
    metadata_mapping:
      productType:
        - '{{"collections":["{productType}"]}}'
        - '$.collection'
      title: '$.id'
      id:
        - '{{"ids":["{id}"]}}'
        - '$.id'
      collection: '$.collection'
      bbox: '$.bbox'
      geometry:
        - '{{"intersects":{geometry#to_geojson}}}'
        - '($.geometry.`str()`.`sub(/^None$/, POLYGON((180 -90, 180 90, -180 90, -180 -90, 180 -90)))`)|($.geometry[*])'
      completionTimeFromAscendingNode:
        - '{{"datetime":"{startTimeFromAscendingNode#to_iso_utc_datetime(seconds)}/{completionTimeFromAscendingNode#to_iso_utc_datetime(seconds)}"}}'
        - '$.properties.end_datetime'
      downloadLink: '$.assets.data.href'
      assets: '$.assets'
  products:
    GENERIC_PRODUCT_TYPE:
      productType: '{productType}'
  download:
    type: HTTPDownload
    base_uri: 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/'
    extract: true
    outputs_prefix: /work/scratch/****/eodagworkspace/
  auth:
    credentials:
       username: ****
       password: ****
``

**Environment:**
 - Python version: 3.8.10
 - EODAG version: 2.10.0

Did I write something wrong in the provider configuration ?

Thank you very much
sbrunato commented 1 year ago

Hello @ansotoo , the authentication plugin is missing in your configuration. But no existing eodag auth plugin seams to work with this provider. A new plugin inspired by https://urs.earthdata.nasa.gov/documentation/for_users/data_access/python has to be implemented (contributions by Pull Requests are welcome!). Redirection should keep headers using a mechanism like the one provided in Earthdata documentation:

# overriding requests.Session.rebuild_auth to maintain headers when redirected
class SessionWithHeaderRedirection(requests.Session):
    AUTH_HOST = 'urs.earthdata.nasa.gov'
    def __init__(self, username, password):
        super().__init__()
        self.auth = (username, password)

   # Overrides from the library to keep headers when redirected to or from
   # the NASA auth host.
    def rebuild_auth(self, prepared_request, response):
        headers = prepared_request.headers
        url = prepared_request.url

        if 'Authorization' in headers:
            original_parsed = requests.utils.urlparse(response.request.url)
            redirect_parsed = requests.utils.urlparse(url)

            if (original_parsed.hostname != redirect_parsed.hostname) and \
                    redirect_parsed.hostname != self.AUTH_HOST and \
                    original_parsed.hostname != self.AUTH_HOST:
                del headers['Authorization']
        return
annesophie-cls commented 1 year ago

Hi @sbrunato ,

I wrote this plugin bus it doesn't work, could you help me on that ?

from eodag.plugins.authentication.base import Authentication

import requests
from requests import Session

class SessionWithHeaderRedirection(Session):

    AUTH_HOST = 'urs.earthdata.nasa.gov'

    def __init__(self, username, password):
        super().__init__()
        self.auth = (username, password)

    # Overrides from the library to keep headers when redirected to or from the NASA auth host.
    def rebuild_auth(self, prepared_request, response):
        headers = prepared_request.headers
        url = prepared_request.url

        if 'Authorization' in headers:
            original_parsed = requests.utils.urlparse(response.request.url)
            redirect_parsed = requests.utils.urlparse(url)

            if (original_parsed.hostname != redirect_parsed.hostname) and \
                    redirect_parsed.hostname != self.AUTH_HOST and \
                    original_parsed.hostname != self.AUTH_HOST:
                del headers['Authorization']
        return

class NasaAuthPlugin(Authentication):

    def authenticate(self):
        """Authenticate"""
        self.validate_config_credentials()
        session = SessionWithHeaderRedirection(
            self.config.credentials["username"],
            self.config.credentials["password"],
        )
        return session.auth