WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

AWS 2-3: Investigate accessing AWS Annual 10m LULC dataset #112

Open rajadain opened 2 months ago

rajadain commented 2 months ago

This is the dataset: https://aws.amazon.com/marketplace/pp/prodview-llgtvqflvozwu

Test accessing this via GDAL / QGIS. Confirm if this requires specifying an AWS account.

Ensure the data can be subsetted to an Areas of Interest.

rajadain commented 1 month ago

The dataset is saved in us-west-2, we should be cognizant of associated data transfer costs.

rajadain commented 1 month ago

Browsing the dataset here: https://radiantearth.github.io/stac-browser/#/external/api.impactobservatory.com/stac-aws/collections/io-10m-annual-lulc, we see that the TIFFs are in an arrangement that spans the globe:

https://github.com/WikiWatershed/mmw-geoprocessing/assets/1430060/5d5afe77-789a-40e9-9909-c63709bfdc53

It's also not quite clear how to navigate the TIFF files. There is an index file: s3://io-10m-annual-lulc/io-10m-annual-lulc.ndjson with each newline-delimited row containing JSON like:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "60F-2022",
  "properties": {
    "proj:bbox": [276230, 3789860, 723770, 4683690],
    "proj:epsg": 32760,
    "proj:shape": [89383, 44754],
    "end_datetime": "2023-01-01T00:00:00Z",
    "proj:geometry": {
      "type": "Polygon",
      "coordinates": [
        [
          [276230, 3789860],
          [723770, 3789860],
          [723770, 4683690],
          [276230, 4683690],
          [276230, 3789860]
        ]
      ]
    },
    "proj:projjson": {
      "id": {"code": 32760, "authority": "EPSG"},
      "name": "WGS 84 / UTM zone 60S",
      "type": "ProjectedCRS",
      "$schema": "https://proj.org/schemas/v0.4/projjson.schema.json",
      "base_crs": {
        "id": {"code": 4326, "authority": "EPSG"},
        "name": "WGS 84",
        "datum": {
          "name": "World Geodetic System 1984",
          "type": "GeodeticReferenceFrame",
          "ellipsoid": {
            "name": "WGS 84",
            "semi_major_axis": 6378137,
            "inverse_flattening": 298.257223563
          }
        },
        "coordinate_system": {
          "axis": [
            {
              "name": "Geodetic latitude",
              "unit": "degree",
              "direction": "north",
              "abbreviation": "Lat"
            },
            {
              "name": "Geodetic longitude",
              "unit": "degree",
              "direction": "east",
              "abbreviation": "Lon"
            }
          ],
          "subtype": "ellipsoidal"
        }
      },
      "conversion": {
        "name": "UTM zone 60S",
        "method": {
          "id": {"code": 9807, "authority": "EPSG"},
          "name": "Transverse Mercator"
        },
        "parameters": [
          {
            "id": {"code": 8801, "authority": "EPSG"},
            "name": "Latitude of natural origin",
            "unit": "degree",
            "value": 0
          },
          {
            "id": {"code": 8802, "authority": "EPSG"},
            "name": "Longitude of natural origin",
            "unit": "degree",
            "value": 177
          },
          {
            "id": {"code": 8805, "authority": "EPSG"},
            "name": "Scale factor at natural origin",
            "unit": "unity",
            "value": 0.9996
          },
          {
            "id": {"code": 8806, "authority": "EPSG"},
            "name": "False easting",
            "unit": "metre",
            "value": 500000
          },
          {
            "id": {"code": 8807, "authority": "EPSG"},
            "name": "False northing",
            "unit": "metre",
            "value": 10000000
          }
        ]
      },
      "coordinate_system": {
        "axis": [
          {
            "name": "Easting",
            "unit": "metre",
            "direction": "east",
            "abbreviation": ""
          },
          {
            "name": "Northing",
            "unit": "metre",
            "direction": "north",
            "abbreviation": ""
          }
        ],
        "subtype": "Cartesian"
      }
    },
    "proj:transform": [10, 0, 276230, 0, -10, 4683690, 0, 0, 1],
    "start_datetime": "2022-01-01T00:00:00Z",
    "io:supercell_id": "60F",
    "datetime": "2022-06-01T00:00:00Z"
  },
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [180, -55.99999],
          [180, -48.00137],
          [180, -48.00096176128751],
          [179.99766, -47.96096],
          [179.71251, -47.96805],
          [179.42727, -47.97444],
          [179.14193, -47.98011],
          [178.8565, -47.98508],
          [178.57101, -47.98934],
          [178.28545, -47.9929],
          [177.99985, -47.99574],
          [177.71421, -47.99787],
          [177.42853, -47.99929],
          [177.14285, -48],
          [176.85715, -48],
          [176.57147, -47.99929],
          [176.28579, -47.99787],
          [176.00015, -47.99574],
          [175.71455, -47.9929],
          [175.42899, -47.98934],
          [175.1435, -47.98508],
          [174.85807, -47.98011],
          [174.57273, -47.97444],
          [174.28749, -47.96805],
          [174.00234, -47.96096],
          [173.97997, -48.34337],
          [173.95712, -48.72575],
          [173.93378, -49.10809],
          [173.90995, -49.4904],
          [173.8856, -49.87268],
          [173.86073, -50.25492],
          [173.83532, -50.63713],
          [173.80935, -51.0193],
          [173.7828, -51.40144],
          [173.75567, -51.78354],
          [173.72792, -52.16561],
          [173.69955, -52.54764],
          [173.67054, -52.92964],
          [173.64086, -53.3116],
          [173.6105, -53.69352],
          [173.57943, -54.07541],
          [173.54763, -54.45726],
          [173.51508, -54.83907],
          [173.48176, -55.22085],
          [173.44763, -55.60258],
          [173.41267, -55.98428],
          [173.75367, -55.99374],
          [174.09487, -56.00226],
          [174.43624, -56.00984],
          [174.77776, -56.01646],
          [175.11942, -56.02215],
          [175.46119, -56.02689],
          [175.80305, -56.03068],
          [176.14498, -56.03352],
          [176.48697, -56.03542],
          [176.82899, -56.03636],
          [177.17101, -56.03636],
          [177.51303, -56.03542],
          [177.85502, -56.03352],
          [178.19695, -56.03068],
          [178.53881, -56.02689],
          [178.88058, -56.02215],
          [179.22224, -56.01646],
          [179.56376, -56.00984],
          [179.90513, -56.00226],
          [180, -55.99999]
        ]
      ],
      [
        [
          [-180, -55.99999],
          [-179.75367, -55.99374],
          [-179.41267, -55.98428],
          [-179.44763, -55.60258],
          [-179.48176, -55.22085],
          [-179.51508, -54.83907],
          [-179.54763, -54.45726],
          [-179.57943, -54.07541],
          [-179.6105, -53.69352],
          [-179.64086, -53.3116],
          [-179.67054, -52.92964],
          [-179.69955, -52.54764],
          [-179.72792, -52.16561],
          [-179.75567, -51.78354],
          [-179.7828, -51.40144],
          [-179.80935, -51.0193],
          [-179.83532, -50.63713],
          [-179.86073, -50.25492],
          [-179.8856, -49.87268],
          [-179.90995, -49.4904],
          [-179.93378, -49.10809],
          [-179.95712, -48.72575],
          [-179.97997, -48.34337],
          [-180, -48.00096176128751],
          [-180, -48.00137],
          [-180, -55.99999]
        ]
      ]
    ]
  },
  "links": [],
  "assets": {
    "supercell": {
      "href": "https://s3.us-west-2.amazonaws.com/io-10m-annual-lulc/60F_2022.tif",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized",
      "file:values": [
        {"values": [0], "summary": "No Data"},
        {"values": [1], "summary": "Water"},
        {"values": [2], "summary": "Trees"},
        {"values": [4], "summary": "Flooded vegetation"},
        {"values": [5], "summary": "Crops"},
        {"values": [7], "summary": "Built area"},
        {"values": [8], "summary": "Bare ground"},
        {"values": [9], "summary": "Snow/ice"},
        {"values": [10], "summary": "Clouds"},
        {"values": [11], "summary": "Rangeland"}
      ],
      "raster:bands": [{"nodata": 0, "spatial_resolution": 10}],
      "roles": ["data"]
    }
  },
  "bbox": [173.41267, -56.03636, -179.41267, -47.96096],
  "stac_extensions": [
    "https://stac-extensions.github.io/projection/v1.0.0/schema.json",
    "https://stac-extensions.github.io/file/v2.0.0/schema.json",
    "https://stac-extensions.github.io/raster/v1.1.0/schema.json"
  ],
  "collection": "io-10m-annual-lulc"
}
rajadain commented 1 month ago

After Impact Observatory fixed their API, I was successful in using it to load relevant GeoTIFFs in QGIS:

image

Thanks to @gadomski, here's a way to do the same using pystac-client:

stac-client search https://api.impactobservatory.com/stac-aws -c io-10m-annual-lulc --intersects "$(cat Lower\ Schuylkill\ River,\ HUC-10\ Watershed\ \(ID\ 0204020310\).geojson )" --max-items 10 | stac-asset download

So it seems that we can use STAC clients to identify the COGs that intersect with an area of interest. Next we need to pull just the bits from them that are in the AoI, and not the rest. This will be done via GeoTrellis. I'll next delve into the GeoTrellis documentation to study that.

rajadain commented 1 month ago

I was able to make a demo gist using the Python GIS stack to fetch and subset data for the above shape: https://gist.github.com/rajadain/45e00f2e9518350d12fdb2b92f5f38be. This clarifies the use case in my mind. I'd like to edit this to also use the correct color bands, and do a simple count operation. Then I'll move on to implementing this in GeoTrellis.

rajadain commented 1 month ago

Should use the colors specified here: https://www.impactobservatory.com/maps-for-good/

rajadain commented 1 month ago

This is a good example notebook that shows how to work with this data using other Python tools:

https://edsbook.org/notebooks/gallery/b128b282-dee7-44a7-bc21-f1fd21452a83/notebook

rajadain commented 1 month ago

My latest investigation is here: https://gist.github.com/rajadain/fae21522bfb4cfb2a9d28b35d4e3f573

I'm using this scheme to translate the NLCD values to Impact Observatory classes:

NLCD_TO_IO = {
    11: 1, # NLCD Open Water to IO Water
    12: 9, # NLCD Perennial Ice/Snow to IO Snow/Ice
    21: 7, # NLCD Developed, Open Space to IO Built area
    22: 7, # NLCD Developed, Low Intensity to IO Built area
    23: 7, # NLCD Developed, Medium Intensity to IO Built area
    24: 7, # NLCD Developed, High Intensity to IO Built area
    31: 8, # NLCD Barren Land (Rock/Sand/Clay) to IO Bare ground
    41: 2, # NLCD Deciduous Forest to IO Trees
    42: 2, # NLCD Evergreen Forest to IO Trees
    43: 2, # NLCD Mixed Forest to IO Trees
    52: 11, # NLCD Shrub/Scrub to IO Rangeland
    71: 11, # NLCD Grassland/Herbaceous to IO Rangeland
    81: 5, # NLCD Pasture/Hay to IO Crops
    82: 5, # NLCD Cultivated Crops to IO Crops
    90: 4, # NLCD Woody Wetlands to IO Flooded vegetation
    95: 4, # NLCD Emergent Herbaceous Wetlands to IO Flooded vegetation
}

For the Lower Schuylkill HUC-10 0204020310, this is the output from the IO Global Dataset:

image

and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:

image

For the Conococheague-Opequon HUC-8 02070004, this is the output from the IO Global Dataset:

image

and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:

image

These spot checks show that while there is general agreement, these two datasets are not identical by any means.

aufdenkampe commented 4 weeks ago

@rajadain, thanks for this analysis! I'm not too surprised that are discrepancies, given that the land use / land cover (LULC) classes have different definitions, especially for short vegetation (pasture, shrub, rangeland). Here's some extra into to on IO & NLCD classes for examining your initial crosswalk.

io_lulc_classes = {
    'No Data': 0, 
    'Water': 1, 
    'Trees': 2, 
    'Flooded vegetation': 4, 
    'Crops': 5, 
    'Built area': 7, 
    'Bare ground': 8, 
    'Snow/ice': 9, 
    'Clouds': 10, 
    'Rangeland': 11
}
aufdenkampe commented 4 weeks ago

@rajadain, one suggested edit to your crosswalk might be to change NLCD 81 Pasture/Hay from IO 5 Crops to IO 11 Rangeland:

81: 11, # NLCD Pasture/Hay to IO Rangeland

I'm guessing this update will substantially improve the comparison.

aufdenkampe commented 4 weeks ago

It seems like further exploration of the crosswalk will require close inspection of some of these areas vs satellite imagery, probably in QGIS.

I wonder if anyone else has already come up with an NCLD to IO crosswalk?

rajadain commented 4 weeks ago

Good idea, I was wondering about that too. Updated gist: https://gist.github.com/rajadain/1d3591a7a00c750466e3793bf1a9bcd2

Shape MMW vs IO if NLCD 81 mapped to IO 5 MMW vs IO if NLCD 81 mapped to IO 11
Lower Schuylkil HUC-10 0204020310 image Diff: 34.3% image Diff 24.8%
Conococheague-Opequon HUC-8 02070004 image Diff 30.3% image Diff 30.0%
rajadain commented 4 weeks ago

Also added a diff percentage above that divides the total absolute difference by the total MMW baseline, which shows that NLCD 81 to IO 11 is a better assignment.

aufdenkampe commented 4 weeks ago

@rajadain, thanks for showing the comparison and adding the overall difference.

It looks like cross walking "NLCD 81 pasture/hay" to "IO 11 Rangeland" isn't the perfect solution, as it overshoots the previously observed differences. That said, it is likely the best choice.

Frankly, the overall comparison is quite good. One wouldn't expect a perfect match, given the different resolution, the different satellites, the different processing methods, and the different class definitions. It's actually surprisingly good given all that.