Access Data from the LAADS Archive

noorvanbeers commented 2 years ago

Thank you for this great library! I would like to request the LAADS archive to be added, as I have not been able to add this myself. I saw from issue #3 that this is possible and has been implemented for another archive, however my attempts haven't been fruitful.

Is your feature request related to a problem? Please describe. I am trying to access data in the MYDATML2 and MODATML2 collections in the LAADS archive, also available from CMR. This is currently not possible with the most recent version of modis_tools. I have attempted to add the LAADS archive in the same way that the NSIDC DAAC archive was added (downloading from the NSIDC DAAC archive is working for me):

1) In .constants.urls.py I have added "ladsweb.modaps.eosdis.nasa.gov", named LAADS_RESOURCE 2) I have added this (URLs.LAADS_RESOURCE.value) to .granule_handler.py in the get_url_from_granule function, under URLs.NSIDC_RESOURCE.value

The url(s) generated from this is correct. When printed, I can click them and download the file from the website. An example is shown below with the following inputs:

nigeria_bbox = [2.1448863675, 4.002583177, 15.289420717, 14.275061098]
nigeria_granules_terra = granule_client_terra.query(start_date="2019-02-02", end_date="2019-12-31", bounding_box=nigeria_bbox)
GranuleHandler.download_from_granules(nigeria_granules_terra, session, threads=1, path=os.getcwd()+"\\MODIS_data_nigeria\\")

https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MODATML2/2019/129/MODATML2.A2019129.2050.061.2019130073927.hdf

However a missingSchema (invalid URL) error is raised:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\ProgramData\App-V\712CFDF0-0616-4F1D-91E1-998E20D5E558\1473798F-4223-4606-9549-B2B59538A13F\Root\VFS\ProgramFilesX64\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\ProgramData\App-V\712CFDF0-0616-4F1D-91E1-998E20D5E558\1473798F-4223-4606-9549-B2B59538A13F\Root\VFS\ProgramFilesX64\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/beersehv/PycharmProjects/SpaceCapabilitiesConda/MODIS_data_retriever2.py", line 41, in <module>
    GranuleHandler.download_from_granules(nigeria_granules_terra, session, threads=1, path=os.getcwd()+"\\MODIS_data_nigeria\\")
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 54, in download_from_granules
    return cls.download_from_urls(
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 137, in download_from_urls
    req = cls._get(
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 173, in _get
    req = session.get(location, stream=stream)
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\requests\sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\requests\sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\requests\sessions.py", line 484, in prepare_request
    p.prepare(
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\requests\models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\requests\models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL '/oauth/login?redirect=%2Farchive%2FallData%2F61%2FMODATML2%2F2019%2F129%2FMODATML2.A2019129.2050.061.2019130073927.hdf': No scheme supplied. Perhaps you meant http:///oauth/login?redirect=%2Farchive%2FallData%2F61%2FMODATML2%2F2019%2F129%2FMODATML2.A2019129.2050.061.2019130073927.hdf?

When I prefix the LAADS_RESOURCE with "https://", an exception is raised that no matching link is found.

  File "<input>", line 1, in <module>
  File "C:\ProgramData\App-V\712CFDF0-0616-4F1D-91E1-998E20D5E558\1473798F-4223-4606-9549-B2B59538A13F\Root\VFS\ProgramFilesX64\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\ProgramData\App-V\712CFDF0-0616-4F1D-91E1-998E20D5E558\1473798F-4223-4606-9549-B2B59538A13F\Root\VFS\ProgramFilesX64\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/beersehv/PycharmProjects/SpaceCapabilitiesConda/MODIS_data_retriever2.py", line 41, in <module>
    GranuleHandler.download_from_granules(nigeria_granules_terra, session, threads=1, path=os.getcwd()+"\\MODIS_data_nigeria\\")
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 52, in download_from_granules
    urls = [cls.get_url_from_granule(x, "hdf") for x in granules]
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 52, in <listcomp>
    urls = [cls.get_url_from_granule(x, "hdf") for x in granules]
  File "C:\Users\beersehv\Anaconda3\envs\SpaceCapabilitiesConda\lib\site-packages\modis_tools\granule_handler.py", line 103, in get_url_from_granule
    raise Exception("No matching link found")
Exception: No matching link found

Describe the solution you'd like If this could be implemented I would be very grateful! The library is excellent and I feel this added feature would be a small addition to implement!

noorvanbeers commented 2 years ago

I found a solution to my issue! The link that is generated by the _get_location function in .granule_handler.py leads to a redirect link, which was blocked. I have now amended this function to the function below. Note that the database_LAADS boolean is added to the download_from_granules function, and parsed by each subsequent function.

    @staticmethod
    def _get_location(url: HttpUrl, session: Session, database_LAADS: bool) -> str:
        """Make initial request to fetch file location from header."""
        split_result = urlsplit(url)
        https_url = split_result._replace(scheme="https").geturl()
        if database_LAADS:
            location_resp = session.get(https_url, allow_redirects=True)
            location = location_resp.url
        else:
            location_resp = session.get(https_url, allow_redirects=False)
            location = location_resp.headers.get("Location")
            if not location:
                raise FileNotFoundError("No file location found")
        return location

The other two amendments stated in my initial issue are 1) adding the LAADS resource url in resources.py:

""" URLs for the API """
from enum import Enum

class URLs(Enum):
    """URLs"""

    API: str = "cmr.earthdata.nasa.gov"
    URS: str = "urs.earthdata.nasa.gov"
    RESOURCE: str = "e4ftl01.cr.usgs.gov"
    NSIDC_RESOURCE: str = "n5eil01u.ecs.nsidc.org"
    LAADS_RESOURCE: str = "ladsweb.modaps.eosdis.nasa.gov"
    EARTHDATA: str = ".earthdata.nasa.gov"

And 2) adding it to the get_url_from_granule function in granule_handler.py:

    @staticmethod
    def get_url_from_granule(granule: Granule, ext: ParamType = "hdf") -> HttpUrl:
        """Return link for file extension from Earthdata resource."""
        for link in granule.links:
            if (
                link.href.host
                in [
                    URLs.RESOURCE.value,
                    URLs.NSIDC_RESOURCE.value,
                    URLs.LAADS_RESOURCE.value,

            ]
                and link.href.path.endswith(ext)
            ):
                return link.href
        raise Exception("No matching link found")

As I suspected it was an easy addition for the LAADS archive to be accessible with the modis_tools package; thank you again for the library!

jamie-sgro commented 2 years ago

Hi @noorvanbeers sorry we weren't able to get to your thread here until you found a resolution. Do you anticipate that the solution you've outlined requires a change to the modis-tools codebase or it's documentation?

polpel commented 1 year ago

Hi @jamie-sgro, I found this issue and I'm just commenting to say that I also wanted to use modis-tools to access data from the LAADS Archive, and ended up using pretty much the same solution as @noorvanbeers (just checking if url.host == "ladsweb.modaps.eosdis.nasa.gov" instead of adding a new bool argument). I think it would be great if this simple fix could be added to the codebase.

jamie-sgro commented 1 year ago

Hi @polpel, thanks for documenting what worked for you here. If you'd like, I'd absolutely invite you to create a small PR with those changes in mind and I'd have the team review it in short order. Otherwise I've made a task to have our team revisit your note here which will like likely result in new PR within the next week or so

fraymio / modis-tools

Access Data from the LAADS Archive #20