Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 4 forks source link

allow setting tiff scale + offset + custom metadata #317

Closed jdries closed 1 month ago

jdries commented 3 months ago

Add a format option to set tiff metadata. Some of these are regular tiff tags, others will have to be encoded as gdal band metadata.

The options in geotrellis are limited, but there are very basic tifftools available in linux: https://linux.die.net/man/1/tiffset The idea is to be able to set metadata tags without touching the rest of the file, avoiding a full rewrite. Ideally this is done right after writing the tiff in geotrellis.

Format option will have to be passed through via save_result.

VictorVerhaert commented 3 months ago

example of custom metadata: https://github.com/VITO-RS-Vegetation/lcfm-production/issues/18

jdries commented 3 months ago

For tiffset, we would need to add libtiff-tools to the container.

JorisCod commented 2 months ago

The current metadata, visible through gdalinfo, is:

Metadata: PROCESSING_SOFTWARE=0.39.0a1 AREA_OR_POINT=Area

both can stay, but what we are looking for is:

Metadata: AREA_OR_POINT=Area bands=['s2-B02-p10', 's2-B02-p25'] copyright=LCFM project 2020 / Contains modified Copernicus Sentinel data (2020) processed by LCFM consortium creation_time=2024-07-19 00:35:04.693848 license=CC-BY 4.0 - https://creativecommons.org/licenses/by/4.0/ product_crs=EPSG:32629 product_grid=Sentinel-2 UTM tiling grid product_tile=29TNE product_type=LSF monthly median composite for band B02 reference=TODO time_end=2020-02-29T23:59:59Z time_start=2020-02-01T00:00:00Z title=LCFM Monthly Land Surface Features (LSF-MONTHLY) product at 10m resolution for year 2020 version=v002-satio

So this would be something quite flexible. We are setting the metadata through rasterio (python) and passing a dictionary:

        if metadata:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                dst.update_tags(**metadata)

        if bands_names:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                for i, b in enumerate(bands_names):
                    dst.set_band_description(i + 1, b)

Additionally, there is a difference in how we set the band description and how OpenEO does it: OpenEO: Band 2 Block=64x64 Type=Int16, ColorInterp=Undefined NoData Value=-32768 Overviews: 1033x1542, 517x771, 259x386, 130x193, 65x97, 33x49 Metadata: DESCRIPTION=B02_P25

We: Band 17 Block=1024x1024 Type=UInt16, ColorInterp=Undefined Description = s2-B08-p25 NoData Value=65535 Offset: 0.0031999999191612, Scale:8.16687767161827e-06

The description in OpenEO is at metadata level, which makes that the band names are not displayed in the symbology in Qgis.

Relatedly, as you see in the output, there's an offset and scale on every band. In principle, the offset and scale can be different per band.

JorisCod commented 2 months ago

Just checking whether this issue is already planned? If it's too extensive, can the scaling and offset be done first and the metadata in a separate issue?

JorisCod commented 2 months ago

We set the scale and offset through gdal_translate: https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-a_scale but there are likely other options.

bossie commented 2 months ago

Regardless of how we put them in the GeoTiff, will the user also provide values for scale/offset in the format options?

bossie commented 2 months ago

I could get this to work:

tiffset -s 42112 '<GDALMetadata>
  <Item name="PROCESSING_SOFTWARE">0.40.1a1</Item>
  <Item name="DESCRIPTION" sample="0">red</Item>
  <Item name="SCALE" sample="0" role="scale">1.23</Item>
  <Item name="OFFSET" sample="0" role="offset">4.56</Item>
</GDALMetadata>' test_load_stac_datacube_parameters.tif

where:

bossie commented 1 month ago

Runs locally but tests in Jenkins fail with a segmentation fault:

subprocess.CalledProcessError: Command '['tiffset', '-s', '42112', '0.40.1a1TileRowTileCol', '/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-885/pytest-tmp/pytest-of-jenkins/pytest-0/test_separate_asset_per_band_l0/openEO_2021-06-05Z_TileRow.tif']' died with <Signals.SIGSEGV: 11>.

bossie commented 1 month ago

Possible method to get core dumps within the container: https://stackoverflow.com/a/72048923.

jdries commented 1 month ago

The segfault can be reproduced in kubernetes, but it does seem to depend on the exact tag and value, as I was able to set for instance the description tag.

dmesg entry for segfaults is not so helpfull:

[782319.609931] tiffset[2919160]: segfault at 6d ip 00007f1e26e96895 sp 00007fff6675e9b8 error 4 in libc-2.28.so[7f1e26d36000+1bc000]
[782319.612686] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
[782340.241434] tiffset[2919401]: segfault at 6d ip 00007f95ff2b1895 sp 00007fff7ca95a68 error 4 in libc-2.28.so[7f95ff151000+1bc000]
[782340.242934] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
jdries commented 1 month ago

Found a potential solution: you can also set tag values via a file, and that seems to avoid the segfault. This worked for me: tiffset -sf 42112 tag.txt ESA_WorldCover_10m_2021_v200_S55W071_S2RGBNIR_v2.tif

bossie commented 1 month ago

Oh dear.

That does seem to work. :+1:

bossie commented 1 month ago

Remaining work:

bossie commented 4 weeks ago

Specify band-specific scale, offset and other metadata in the GTiff format options like so:

{
  "bands_metadata": {
    "Flat:1": {
      "SCALE": 1.23
    },
    "Flat:2": {
      "OFFSET": 4.56
    },
    "Flat:3": {
      "SCALE": 7.89,
      "OFFSET": 10.11,
      "ARBITRARY": "value"
    }
  }
}
bossie commented 4 weeks ago

Available on CDSE-dev.

Example process graph:

{
  "process_graph": {
    "loadcollection1": {
      "arguments": {
        "bands": [
          "B04",
          "B03",
          "B02"
        ],
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "east": 5.506705419643688,
          "north": 50.35658998840864,
          "south": 50.35253773281718,
          "west": 5.50057681627676
        },
        "temporal_extent": [
          "2024-04-04T00:00:00Z",
          "2024-05-04T00:00:00Z"
        ]
      },
      "process_id": "load_collection"
    },
    "saveresult1": {
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "format": "GTIFF",
        "options": {
          "bands_metadata": {
            "B04": {"SCALE": 1.23},
            "B03": {"OFFSET": 4.56},
            "B02": {"SCALE": 7.89, "OFFSET": 10.11}
          }
        }
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

From gdalinfo on one of the assets:

Band 1 Block=128x128 Type=Int16, ColorInterp=Red
  Description = B04
  NoData Value=-32768
  Offset: 0,   Scale:1.23
Band 2 Block=128x128 Type=Int16, ColorInterp=Green
  Description = B03
  NoData Value=-32768
  Offset: 4.56,   Scale:1
Band 3 Block=128x128 Type=Int16, ColorInterp=Blue
  Description = B02
  NoData Value=-32768
  Offset: 10.11,   Scale:7.89