Open rajadain opened 7 months ago
The dataset is saved in us-west-2
, we should be cognizant of associated data transfer costs.
Browsing the dataset here: https://radiantearth.github.io/stac-browser/#/external/api.impactobservatory.com/stac-aws/collections/io-10m-annual-lulc, we see that the TIFFs are in an arrangement that spans the globe:
It's also not quite clear how to navigate the TIFF files. There is an index file: s3://io-10m-annual-lulc/io-10m-annual-lulc.ndjson with each newline-delimited row containing JSON like:
{
"type": "Feature",
"stac_version": "1.0.0",
"id": "60F-2022",
"properties": {
"proj:bbox": [276230, 3789860, 723770, 4683690],
"proj:epsg": 32760,
"proj:shape": [89383, 44754],
"end_datetime": "2023-01-01T00:00:00Z",
"proj:geometry": {
"type": "Polygon",
"coordinates": [
[
[276230, 3789860],
[723770, 3789860],
[723770, 4683690],
[276230, 4683690],
[276230, 3789860]
]
]
},
"proj:projjson": {
"id": {"code": 32760, "authority": "EPSG"},
"name": "WGS 84 / UTM zone 60S",
"type": "ProjectedCRS",
"$schema": "https://proj.org/schemas/v0.4/projjson.schema.json",
"base_crs": {
"id": {"code": 4326, "authority": "EPSG"},
"name": "WGS 84",
"datum": {
"name": "World Geodetic System 1984",
"type": "GeodeticReferenceFrame",
"ellipsoid": {
"name": "WGS 84",
"semi_major_axis": 6378137,
"inverse_flattening": 298.257223563
}
},
"coordinate_system": {
"axis": [
{
"name": "Geodetic latitude",
"unit": "degree",
"direction": "north",
"abbreviation": "Lat"
},
{
"name": "Geodetic longitude",
"unit": "degree",
"direction": "east",
"abbreviation": "Lon"
}
],
"subtype": "ellipsoidal"
}
},
"conversion": {
"name": "UTM zone 60S",
"method": {
"id": {"code": 9807, "authority": "EPSG"},
"name": "Transverse Mercator"
},
"parameters": [
{
"id": {"code": 8801, "authority": "EPSG"},
"name": "Latitude of natural origin",
"unit": "degree",
"value": 0
},
{
"id": {"code": 8802, "authority": "EPSG"},
"name": "Longitude of natural origin",
"unit": "degree",
"value": 177
},
{
"id": {"code": 8805, "authority": "EPSG"},
"name": "Scale factor at natural origin",
"unit": "unity",
"value": 0.9996
},
{
"id": {"code": 8806, "authority": "EPSG"},
"name": "False easting",
"unit": "metre",
"value": 500000
},
{
"id": {"code": 8807, "authority": "EPSG"},
"name": "False northing",
"unit": "metre",
"value": 10000000
}
]
},
"coordinate_system": {
"axis": [
{
"name": "Easting",
"unit": "metre",
"direction": "east",
"abbreviation": ""
},
{
"name": "Northing",
"unit": "metre",
"direction": "north",
"abbreviation": ""
}
],
"subtype": "Cartesian"
}
},
"proj:transform": [10, 0, 276230, 0, -10, 4683690, 0, 0, 1],
"start_datetime": "2022-01-01T00:00:00Z",
"io:supercell_id": "60F",
"datetime": "2022-06-01T00:00:00Z"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[180, -55.99999],
[180, -48.00137],
[180, -48.00096176128751],
[179.99766, -47.96096],
[179.71251, -47.96805],
[179.42727, -47.97444],
[179.14193, -47.98011],
[178.8565, -47.98508],
[178.57101, -47.98934],
[178.28545, -47.9929],
[177.99985, -47.99574],
[177.71421, -47.99787],
[177.42853, -47.99929],
[177.14285, -48],
[176.85715, -48],
[176.57147, -47.99929],
[176.28579, -47.99787],
[176.00015, -47.99574],
[175.71455, -47.9929],
[175.42899, -47.98934],
[175.1435, -47.98508],
[174.85807, -47.98011],
[174.57273, -47.97444],
[174.28749, -47.96805],
[174.00234, -47.96096],
[173.97997, -48.34337],
[173.95712, -48.72575],
[173.93378, -49.10809],
[173.90995, -49.4904],
[173.8856, -49.87268],
[173.86073, -50.25492],
[173.83532, -50.63713],
[173.80935, -51.0193],
[173.7828, -51.40144],
[173.75567, -51.78354],
[173.72792, -52.16561],
[173.69955, -52.54764],
[173.67054, -52.92964],
[173.64086, -53.3116],
[173.6105, -53.69352],
[173.57943, -54.07541],
[173.54763, -54.45726],
[173.51508, -54.83907],
[173.48176, -55.22085],
[173.44763, -55.60258],
[173.41267, -55.98428],
[173.75367, -55.99374],
[174.09487, -56.00226],
[174.43624, -56.00984],
[174.77776, -56.01646],
[175.11942, -56.02215],
[175.46119, -56.02689],
[175.80305, -56.03068],
[176.14498, -56.03352],
[176.48697, -56.03542],
[176.82899, -56.03636],
[177.17101, -56.03636],
[177.51303, -56.03542],
[177.85502, -56.03352],
[178.19695, -56.03068],
[178.53881, -56.02689],
[178.88058, -56.02215],
[179.22224, -56.01646],
[179.56376, -56.00984],
[179.90513, -56.00226],
[180, -55.99999]
]
],
[
[
[-180, -55.99999],
[-179.75367, -55.99374],
[-179.41267, -55.98428],
[-179.44763, -55.60258],
[-179.48176, -55.22085],
[-179.51508, -54.83907],
[-179.54763, -54.45726],
[-179.57943, -54.07541],
[-179.6105, -53.69352],
[-179.64086, -53.3116],
[-179.67054, -52.92964],
[-179.69955, -52.54764],
[-179.72792, -52.16561],
[-179.75567, -51.78354],
[-179.7828, -51.40144],
[-179.80935, -51.0193],
[-179.83532, -50.63713],
[-179.86073, -50.25492],
[-179.8856, -49.87268],
[-179.90995, -49.4904],
[-179.93378, -49.10809],
[-179.95712, -48.72575],
[-179.97997, -48.34337],
[-180, -48.00096176128751],
[-180, -48.00137],
[-180, -55.99999]
]
]
]
},
"links": [],
"assets": {
"supercell": {
"href": "https://s3.us-west-2.amazonaws.com/io-10m-annual-lulc/60F_2022.tif",
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"file:values": [
{"values": [0], "summary": "No Data"},
{"values": [1], "summary": "Water"},
{"values": [2], "summary": "Trees"},
{"values": [4], "summary": "Flooded vegetation"},
{"values": [5], "summary": "Crops"},
{"values": [7], "summary": "Built area"},
{"values": [8], "summary": "Bare ground"},
{"values": [9], "summary": "Snow/ice"},
{"values": [10], "summary": "Clouds"},
{"values": [11], "summary": "Rangeland"}
],
"raster:bands": [{"nodata": 0, "spatial_resolution": 10}],
"roles": ["data"]
}
},
"bbox": [173.41267, -56.03636, -179.41267, -47.96096],
"stac_extensions": [
"https://stac-extensions.github.io/projection/v1.0.0/schema.json",
"https://stac-extensions.github.io/file/v2.0.0/schema.json",
"https://stac-extensions.github.io/raster/v1.1.0/schema.json"
],
"collection": "io-10m-annual-lulc"
}
After Impact Observatory fixed their API, I was successful in using it to load relevant GeoTIFFs in QGIS:
Thanks to @gadomski, here's a way to do the same using pystac-client:
stac-client search https://api.impactobservatory.com/stac-aws -c io-10m-annual-lulc --intersects "$(cat Lower\ Schuylkill\ River,\ HUC-10\ Watershed\ \(ID\ 0204020310\).geojson )" --max-items 10 | stac-asset download
So it seems that we can use STAC clients to identify the COGs that intersect with an area of interest. Next we need to pull just the bits from them that are in the AoI, and not the rest. This will be done via GeoTrellis. I'll next delve into the GeoTrellis documentation to study that.
I was able to make a demo gist using the Python GIS stack to fetch and subset data for the above shape: https://gist.github.com/rajadain/45e00f2e9518350d12fdb2b92f5f38be. This clarifies the use case in my mind. I'd like to edit this to also use the correct color bands, and do a simple count operation. Then I'll move on to implementing this in GeoTrellis.
Should use the colors specified here: https://www.impactobservatory.com/maps-for-good/
This is a good example notebook that shows how to work with this data using other Python tools:
https://edsbook.org/notebooks/gallery/b128b282-dee7-44a7-bc21-f1fd21452a83/notebook
My latest investigation is here: https://gist.github.com/rajadain/fae21522bfb4cfb2a9d28b35d4e3f573
I'm using this scheme to translate the NLCD values to Impact Observatory classes:
NLCD_TO_IO = {
11: 1, # NLCD Open Water to IO Water
12: 9, # NLCD Perennial Ice/Snow to IO Snow/Ice
21: 7, # NLCD Developed, Open Space to IO Built area
22: 7, # NLCD Developed, Low Intensity to IO Built area
23: 7, # NLCD Developed, Medium Intensity to IO Built area
24: 7, # NLCD Developed, High Intensity to IO Built area
31: 8, # NLCD Barren Land (Rock/Sand/Clay) to IO Bare ground
41: 2, # NLCD Deciduous Forest to IO Trees
42: 2, # NLCD Evergreen Forest to IO Trees
43: 2, # NLCD Mixed Forest to IO Trees
52: 11, # NLCD Shrub/Scrub to IO Rangeland
71: 11, # NLCD Grassland/Herbaceous to IO Rangeland
81: 5, # NLCD Pasture/Hay to IO Crops
82: 5, # NLCD Cultivated Crops to IO Crops
90: 4, # NLCD Woody Wetlands to IO Flooded vegetation
95: 4, # NLCD Emergent Herbaceous Wetlands to IO Flooded vegetation
}
For the Lower Schuylkill HUC-10 0204020310, this is the output from the IO Global Dataset:
and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:
For the Conococheague-Opequon HUC-8 02070004, this is the output from the IO Global Dataset:
and this is how its output compares to MMW's using NLCD 2019 translated to IO classes:
These spot checks show that while there is general agreement, these two datasets are not identical by any means.
@rajadain, thanks for this analysis! I'm not too surprised that are discrepancies, given that the land use / land cover (LULC) classes have different definitions, especially for short vegetation (pasture, shrub, rangeland). Here's some extra into to on IO & NLCD classes for examining your initial crosswalk.
io_lulc_classes = {
'No Data': 0,
'Water': 1,
'Trees': 2,
'Flooded vegetation': 4,
'Crops': 5,
'Built area': 7,
'Bare ground': 8,
'Snow/ice': 9,
'Clouds': 10,
'Rangeland': 11
}
@rajadain, one suggested edit to your crosswalk might be to change NLCD 81 Pasture/Hay from IO 5 Crops to IO 11 Rangeland:
81: 11, # NLCD Pasture/Hay to IO Rangeland
I'm guessing this update will substantially improve the comparison.
It seems like further exploration of the crosswalk will require close inspection of some of these areas vs satellite imagery, probably in QGIS.
I wonder if anyone else has already come up with an NCLD to IO crosswalk?
Good idea, I was wondering about that too. Updated gist: https://gist.github.com/rajadain/1d3591a7a00c750466e3793bf1a9bcd2
Shape | MMW vs IO if NLCD 81 mapped to IO 5 | MMW vs IO if NLCD 81 mapped to IO 11 |
---|---|---|
Lower Schuylkil HUC-10 0204020310 | Diff: 34.3% | Diff 24.8% |
Conococheague-Opequon HUC-8 02070004 | Diff 30.3% | Diff 30.0% |
Also added a diff percentage above that divides the total absolute difference by the total MMW baseline, which shows that NLCD 81 to IO 11 is a better assignment.
@rajadain, thanks for showing the comparison and adding the overall difference.
It looks like cross walking "NLCD 81 pasture/hay" to "IO 11 Rangeland" isn't the perfect solution, as it overshoots the previously observed differences. That said, it is likely the best choice.
Frankly, the overall comparison is quite good. One wouldn't expect a perfect match, given the different resolution, the different satellites, the different processing methods, and the different class definitions. It's actually surprisingly good given all that.
This is the dataset: https://aws.amazon.com/marketplace/pp/prodview-llgtvqflvozwu
Test accessing this via GDAL / QGIS. Confirm if this requires specifying an AWS account.
Ensure the data can be subsetted to an Areas of Interest.