carbonplan / xrefcoord

MIT License
1 stars 0 forks source link

Understanding and implementing coordinate generation to geoxarray #12

Open djhoese opened 6 months ago

djhoese commented 6 months ago

I was pointed here by @maxrjones a long time ago as part of the NASA ACCESS19 Pangeo project. The main point of the conversation (over email) was that the coordinate generation done by this project could be something that the geoxarray project could do. I'm looking at implementing this but I was hoping I could get some more information to guide me. I'm trying to keep interfaces similar, if not exactly the same, with rioxarray, but also trying to avoid the rasterio/GDAL dependencies.

Looking at the main coordinate function here:

https://github.com/carbonplan/xrefcoord/blob/f7c46c845cb34175ab56a49a26941257a457c87c/xrefcoord/coords.py#L35-L46

  1. Why do imagecodecs have to be loaded here?
  2. What creates attrs? GDAL's python interfaces don't use the geotiff ModelPixelScale/ModelTiepoint names but refer to numbers like these as the geotransform. While I would definitely have support for the geotransform in geoxarray I'd be fine having support for the ModelPixelScale naming too, I'd just like to know what to reference if there is a standard tool generating this.

I might have more questions later, but any info toward the above would be a big help.

maxrjones commented 6 months ago

Hi @djhoese, thanks for following up about this! If the coordinate generation doesn't go into geoxarray, I think an appropriate place would be the new xarray backend for kerchunk (xref https://github.com/fsspec/kerchunk/pull/372). That would be better than a standalone accessor like xrefcoord IMO because the coordinates generated from the GeoTiff metadata will only be correct if there's no slicing done first, which motivates generating them while opening the dataset. Let me know if you have thoughts about which is better.

  1. Why do imagecodecs have to be loaded here?

Good question, it doesn't (e.g., https://github.com/carbonplan/xrefcoord/pull/13).

  1. What creates attrs? GDAL's python interfaces don't use the geotiff ModelPixelScale/ModelTiepoint names but refer to numbers like these as the geotransform. While I would definitely have support for the geotransform in geoxarray I'd be fine having support for the ModelPixelScale naming too, I'd just like to know what to reference if there is a standard tool generating this.

Kerchunk puts the GeoTiff metadata into .zattrs, which is loaded by the Zarr backend into attrs. Kerchunk uses Tifffile to extract the GeoTiff metadata, which is done here: https://github.com/cgohlke/tifffile/blob/5ced58caa74936ecc53f58ade94938e251e83138/tifffile/tifffile.py#L9841-L9922

djhoese commented 5 months ago

Just curious, is this tifffile loaded tiff -> DataArray in a separate library at all? Does tifffile have an xarray engine to load files with xr.open_dataset(..., engine="tifffile")? Or is this all in kerchunk currently?

djhoese commented 5 months ago

Oh and I see when I load a geotiff with tifffile it includes a PCSCitationGeoKey in the .geotiff_metadat. Any idea if this is placed somewhere in the generated DataArray? If so, I could load/create a pyproj CRS object from it (most likely).

maxrjones commented 5 months ago

Just curious, is this tifffile loaded tiff -> DataArray in a separate library at all? Does tifffile have an xarray engine to load files with xr.open_dataset(..., engine="tifffile")? Or is this all in kerchunk currently?

I don't know of a tifffile xarray backend. I use rioxarray.open_rasterio() for loading the geotiffs without kerchunk and the kerchunk backend for loading the reference files.

Oh and I see when I load a geotiff with tifffile it includes a PCSCitationGeoKey in the .geotiff_metadat. Any idea if this is placed somewhere in the generated DataArray? If so, I could load/create a pyproj CRS object from it (most likely).

It would be in .zattrs in the reference file. Here's an example from https://projectpythia.org/kerchunk-cookbook/notebooks/generating_references/GeoTIFF.html:

    ".zattrs": "{\"multiscales\":[{\"datasets\":[{\"path\":\"0\"},{\"path\":\"1\"},{\"path\":\"2\"},{\"path\":\"3\"},{\"path\":\"4\"},{\"path\":\"5\"}],\"metadata\":{},\"name\":\"\",\"version\":\"0.1\"}],\"GDAL_METADATA\":\"<GDALMetadata>\\n<Item name="Observation time" format="YYYYMMDDhhmm">202301010100<\\/Item>\\n<Item name="Quantity" unit="mm">Precipitation accumulation<\\/Item>\\n<Item name="Gain">0.010000<\\/Item>\\n<Item name="Offset">0.000000<\\/Item>\\n<Item name="Nodata">65535<\\/Item>\\n<Item name="Undetect">0<\\/Item>\\n<Item name="Accumulation time" unit="h">24<\\/Item>\\n<Item name="Temporal type">Past<\\/Item>\\n<Item name="Z=AR^B relation parameter A">200.000000<\\/Item>\\n<Item name="Z=AR^B relation parameter B">1.600000<\\/Item>\\n<\\/GDALMetadata>\\n\",\"KeyDirectoryVersion\":1,\"KeyRevision\":1,\"KeyRevisionMinor\":0,\"GTModelTypeGeoKey\":1,\"GTRasterTypeGeoKey\":1,\"GTCitationGeoKey\":\"ETRS89 \\/ TM35FIN(E,N)\",\"GeogCitationGeoKey\":\"ETRS89\",\"GeogAngularUnitsGeoKey\":9102,\"GeogTOWGS84GeoKey\":[0.0,0.0,0.0],\"ProjectedCSTypeGeoKey\":3067,\"ProjLinearUnitsGeoKey\":9001,\"ModelPixelScale\":[1169.2930568410832,1168.8701637541064,0.0],\"ModelTiepoint\":[0.0,0.0,0.0,-118331.36640835612,7907751.537263516,0.0]}",

Note this includes ProjectedCSTypeGeoKey rather than PCSCitationGeoKey - I think it'd be important to support both GeoTIFF key flavors.

When the kerchunk backend used to create a xarray dataset, the metadata gets loaded into .attrs:

storage_options = {
    "remote_protocol": "s3",
    "skip_instance_cache": True,
}  # options passed to fsspec
open_dataset_options = {"chunks": {}}  # opens passed to xarray

ds = xr.open_dataset(
    "references/202301012300.json",
    engine="kerchunk",
    storage_options=storage_options,
    open_dataset_options=open_dataset_options,
)
ds.attrs
{'multiscales': [{'datasets': [{'path': '0'},
    {'path': '1'},
    {'path': '2'},
    {'path': '3'},
    {'path': '4'},
    {'path': '5'}],
   'metadata': {},
   'name': '',
   'version': '0.1'}],
 'GDAL_METADATA': '<GDALMetadata>\n<Item name="Observation time" format="YYYYMMDDhhmm">202301012300</Item>\n<Item name="Quantity" unit="mm">Precipitation accumulation</Item>\n<Item name="Gain">0.010000</Item>\n<Item name="Offset">0.000000</Item>\n<Item name="Nodata">65535</Item>\n<Item name="Undetect">0</Item>\n<Item name="Accumulation time" unit="h">24</Item>\n<Item name="Temporal type">Past</Item>\n<Item name="Z=AR^B relation parameter A">200.000000</Item>\n<Item name="Z=AR^B relation parameter B">1.600000</Item>\n</GDALMetadata>\n',
 'KeyDirectoryVersion': 1,
 'KeyRevision': 1,
 'KeyRevisionMinor': 0,
 'GTModelTypeGeoKey': 1,
 'GTRasterTypeGeoKey': 1,
 'GTCitationGeoKey': 'ETRS89 [/](https://file+.vscode-resource.vscode-cdn.net/) TM35FIN(E,N)',
 'GeogCitationGeoKey': 'ETRS89',
 'GeogAngularUnitsGeoKey': 9102,
 'GeogTOWGS84GeoKey': [0.0, 0.0, 0.0],
 'ProjectedCSTypeGeoKey': 3067,
 'ProjLinearUnitsGeoKey': 9001,
 'ModelPixelScale': [1169.2930568410832, 1168.8701637541064, 0.0],
 'ModelTiepoint': [0.0, 0.0, 0.0, -118331.36640835612, 7907751.537263516, 0.0]}