carpentries-incubator / geospatial-python

Introduction to Geospatial Raster and Vector Data with Python
https://carpentries-incubator.github.io/geospatial-python/
Other
155 stars 57 forks source link

Add a tutorial to access NASA CMR STAC api using pystac & stackstac #102

Closed srmsoumya closed 1 year ago

srmsoumya commented 2 years ago

In this tutorial, we learn to

  1. Select an AOI using leafmap
  2. Pull STAC items for HLS data from NASA CMR STAC API using pystac_client
  3. Use stackstac to create lazy xarray's, filter by cloud-cover, compute monthly mosaics
  4. Visualize the change over time
fnattino commented 2 years ago

Hi @srmsoumya, I really like the content of this episode! I have tried to run the code blocks, I wanted to check whether the NASA CMR STAC index could be an alternative to the EarthSearch endpoint used in the data-access episode. However, I get lots of errors like the following when calling .compute() at the very end of the episode:

RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T36QUK.2019225T081947.v2.0/HLS.L30.T36QUK.2019225T081947.v2.0.B04.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T36QUK.2019225T081947.v2.0/HLS.L30.T36QUK.2019225T081947.v2.0.B04.tif' not recognized as a supported file format.")

I could nail this down to rasterio not being able to open the remote assets:

import rasterio
f = rasterio.open("https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T36QUK.2021190T081611.v2.0/HLS.S30.T36QUK.2021190T081611.v2.0.B8A.tif")
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
File rasterio/_base.pyx:261, in rasterio._base.DatasetBase.__init__()

File rasterio/_shim.pyx:78, in rasterio._shim.open_dataset()

File rasterio/_err.pyx:216, in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T36QUK.2021190T081611.v2.0/HLS.S30.T36QUK.2021190T081611.v2.0.B8A.tif' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
Input In [23], in <module>
      1 import rasterio
----> 2 f = rasterio.open("https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T36QUK.2021190T081611.v2.0/HLS.S30.T36QUK.2021190T081611.v2.0.B8A.tif")

File /opt/miniconda3/envs/geospatial/lib/python3.10/site-packages/rasterio/env.py:437, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwds)
    434     session = DummySession()
    436 with env_ctor(session=session):
--> 437     return f(*args, **kwds)

File /opt/miniconda3/envs/geospatial/lib/python3.10/site-packages/rasterio/__init__.py:220, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    216 # Create dataset instances and pass the given env, which will
    217 # be taken over by the dataset's context manager if it is not
    218 # None.
    219 if mode == 'r':
--> 220     s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    221 elif mode == "r+":
    222     s = get_writer_for_path(path, driver=driver)(
    223         path, mode, driver=driver, sharing=sharing, **kwargs
    224     )

File rasterio/_base.pyx:263, in rasterio._base.DatasetBase.__init__()

RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T36QUK.2021190T081611.v2.0/HLS.S30.T36QUK.2021190T081611.v2.0.B8A.tif' not recognized as a supported file format.

The error seems to suggest that the extension is not recognised (maybe because of the multiple dots in the file name?). Specifying the driver (driver=COG) does not help. Any clue what is going wrong here? I am working with rasterio version 1.2.10 and GDAL version 3.4.1.

srmsoumya commented 2 years ago

@fnattino Did you set up a ~/.netrc file to access NASA CMR STAC data?

You can sign-up here & run this script to set things up.

fnattino commented 2 years ago

Wonderful, seems to work indeed - thank you so much!

rbavery commented 2 years ago

@srmsoumya do you recall where you found the instruction to se these configs?

import os
os.environ["GDAL_HTTP_COOKIEFILE"] = "./cookies.txt"
os.environ["GDAL_HTTP_COOKIEJAR"] = "./cookies.txt"

love that you found it! this was needed for me to use the CMR STAC API

srmsoumya commented 2 years ago

@rbavery I had my fair share of trouble trying to access NASA CMR STAC and tried multiple things.

I guess I found this from one of the tutorials: https://nasa-openscapes.github.io/2021-Cloud-Workshop-AGU/how-tos/Earthdata_Cloud__Single_File__HTTPS_Access_COG_Example.html

rbavery commented 1 year ago

I'll look to finish this in November: https://github.com/carpentries-incubator/geospatial-python/issues/82#issuecomment-1286259558

given this issue, we may not want to use CMR STAC as an example anymore and instead switch to something else, maybe the Sentinel-2 data. I'm not sure when the cloud cover filtering will be fixed: https://github.com/nasa/cmr-stac/issues/206

rbavery commented 1 year ago

closing since this is a bit stale and any material needs to be ported to the new lesson template in #158