corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
517 stars 82 forks source link

Envi header information is stripped on write #635

Open AndrewGuenther opened 1 year ago

AndrewGuenther commented 1 year ago

Code Sample

import rioxarray as rx

input = rx.open_rasterio("./test_file.envi")
input.rio.to_raster("./output.envi", driver="ENVI")

Problem description

In the above code, any tag under the ENVI namespace is stripped when the file is written back out. This data reads in just fine, but since the tag structure is flattened on read, that context is then lost on write. So headers like wavelengths, wavelength units, acquisition time, etc are lost.

Expected Output

I would expect a subsequent to_raster call to look exactly the same as the data passed in.

Environment Information

rioxarray (0.13.3) deps:
  rasterio: 1.3.5.post1
    xarray: 2023.1.0
      GDAL: 3.5.3
      GEOS: 3.11.1
      PROJ: 9.0.1
 PROJ DATA: /home/andrew/.cache/pypoetry/virtualenvs/rioxarray-test-GzltKOLB-py3.8/lib/python3.8/site-packages/rasterio/proj_data
 GDAL DATA: /home/andrew/.cache/pypoetry/virtualenvs/rioxarray-test-GzltKOLB-py3.8/lib/python3.8/site-packages/rasterio/gdal_data

Other python deps:
     scipy: None
    pyproj: 3.4.1

System:
    python: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0]
executable: /home/andrew/.cache/pypoetry/virtualenvs/rioxarray-test-GzltKOLB-py3.8/bin/python
   machine: Linux-5.15.0-58-generic-x86_64-with-glibc2.29

Installation method

poetry

AndrewGuenther commented 1 year ago

A simple solution would be to store an attribute indicating what fields were read from the ENVI namespace and then automatically include those tags on write. The gotcha however is if someone were to add additional attributes or band information they wanted included in the ENVI header they'd need a way to indicate that.

Tracking of the read headers would probably go here: https://github.com/corteva/rioxarray/blob/master/rioxarray/_io.py#L713-L725

One potential solution would be to not flatten the tag structure for general metadata at all. So if a user wanted to include data in the ENVI namespace they could do so this way:

data.attrs["ENVI"]["wavelength units"]

Per band information is a bit more difficult...

snowman2 commented 1 year ago

Tag writing is here

snowman2 commented 1 year ago

When there are multiple bands, the band_tags key can be added as a list of dicts.