Closed jdries closed 1 month ago
example of custom metadata: https://github.com/VITO-RS-Vegetation/lcfm-production/issues/18
For tiffset, we would need to add libtiff-tools to the container.
The current metadata, visible through gdalinfo, is:
Metadata: PROCESSING_SOFTWARE=0.39.0a1 AREA_OR_POINT=Area
both can stay, but what we are looking for is:
Metadata: AREA_OR_POINT=Area bands=['s2-B02-p10', 's2-B02-p25'] copyright=LCFM project 2020 / Contains modified Copernicus Sentinel data (2020) processed by LCFM consortium creation_time=2024-07-19 00:35:04.693848 license=CC-BY 4.0 - https://creativecommons.org/licenses/by/4.0/ product_crs=EPSG:32629 product_grid=Sentinel-2 UTM tiling grid product_tile=29TNE product_type=LSF monthly median composite for band B02 reference=TODO time_end=2020-02-29T23:59:59Z time_start=2020-02-01T00:00:00Z title=LCFM Monthly Land Surface Features (LSF-MONTHLY) product at 10m resolution for year 2020 version=v002-satio
So this would be something quite flexible. We are setting the metadata through rasterio (python) and passing a dictionary:
if metadata:
with rasterio.open(final_vrt_fn, 'r+') as dst:
dst.update_tags(**metadata)
if bands_names:
with rasterio.open(final_vrt_fn, 'r+') as dst:
for i, b in enumerate(bands_names):
dst.set_band_description(i + 1, b)
Additionally, there is a difference in how we set the band description and how OpenEO does it: OpenEO: Band 2 Block=64x64 Type=Int16, ColorInterp=Undefined NoData Value=-32768 Overviews: 1033x1542, 517x771, 259x386, 130x193, 65x97, 33x49 Metadata: DESCRIPTION=B02_P25
We: Band 17 Block=1024x1024 Type=UInt16, ColorInterp=Undefined Description = s2-B08-p25 NoData Value=65535 Offset: 0.0031999999191612, Scale:8.16687767161827e-06
The description in OpenEO is at metadata level, which makes that the band names are not displayed in the symbology in Qgis.
Relatedly, as you see in the output, there's an offset and scale on every band. In principle, the offset and scale can be different per band.
Just checking whether this issue is already planned? If it's too extensive, can the scaling and offset be done first and the metadata in a separate issue?
We set the scale and offset through gdal_translate: https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-a_scale but there are likely other options.
Regardless of how we put them in the GeoTiff, will the user also provide values for scale/offset in the format options?
I could get this to work:
tiffset -s 42112 '<GDALMetadata>
<Item name="PROCESSING_SOFTWARE">0.40.1a1</Item>
<Item name="DESCRIPTION" sample="0">red</Item>
<Item name="SCALE" sample="0" role="scale">1.23</Item>
<Item name="OFFSET" sample="0" role="offset">4.56</Item>
</GDALMetadata>' test_load_stac_datacube_parameters.tif
where:
sample
points to the band number;role
is necessary to get GDAL to treat SCALE/OFFSET as actual scale/offset (not supported by Geotrellis).Runs locally but tests in Jenkins fail with a segmentation fault:
subprocess.CalledProcessError: Command '['tiffset', '-s', '42112', '
', '/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-885/pytest-tmp/pytest-of-jenkins/pytest-0/test_separate_asset_per_band_l0/openEO_2021-06-05Z_TileRow.tif']' died with <Signals.SIGSEGV: 11>. - 0.40.1a1
- TileRow
- TileCol
Possible method to get core dumps within the container: https://stackoverflow.com/a/72048923.
The segfault can be reproduced in kubernetes, but it does seem to depend on the exact tag and value, as I was able to set for instance the description tag.
dmesg entry for segfaults is not so helpfull:
[782319.609931] tiffset[2919160]: segfault at 6d ip 00007f1e26e96895 sp 00007fff6675e9b8 error 4 in libc-2.28.so[7f1e26d36000+1bc000]
[782319.612686] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
[782340.241434] tiffset[2919401]: segfault at 6d ip 00007f95ff2b1895 sp 00007fff7ca95a68 error 4 in libc-2.28.so[7f95ff151000+1bc000]
[782340.242934] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
Found a potential solution: you can also set tag values via a file, and that seems to avoid the segfault.
This worked for me:
tiffset -sf 42112 tag.txt ESA_WorldCover_10m_2021_v200_S55W071_S2RGBNIR_v2.tif
Oh dear.
That does seem to work. :+1:
Remaining work:
separate_asset_per_band
;Specify band-specific scale, offset and other metadata in the GTiff
format options like so:
{
"bands_metadata": {
"Flat:1": {
"SCALE": 1.23
},
"Flat:2": {
"OFFSET": 4.56
},
"Flat:3": {
"SCALE": 7.89,
"OFFSET": 10.11,
"ARBITRARY": "value"
}
}
}
Available on CDSE-dev.
Example process graph:
{
"process_graph": {
"loadcollection1": {
"arguments": {
"bands": [
"B04",
"B03",
"B02"
],
"id": "SENTINEL2_L2A",
"spatial_extent": {
"east": 5.506705419643688,
"north": 50.35658998840864,
"south": 50.35253773281718,
"west": 5.50057681627676
},
"temporal_extent": [
"2024-04-04T00:00:00Z",
"2024-05-04T00:00:00Z"
]
},
"process_id": "load_collection"
},
"saveresult1": {
"arguments": {
"data": {
"from_node": "loadcollection1"
},
"format": "GTIFF",
"options": {
"bands_metadata": {
"B04": {"SCALE": 1.23},
"B03": {"OFFSET": 4.56},
"B02": {"SCALE": 7.89, "OFFSET": 10.11}
}
}
},
"process_id": "save_result",
"result": true
}
}
}
From gdalinfo
on one of the assets:
Band 1 Block=128x128 Type=Int16, ColorInterp=Red
Description = B04
NoData Value=-32768
Offset: 0, Scale:1.23
Band 2 Block=128x128 Type=Int16, ColorInterp=Green
Description = B03
NoData Value=-32768
Offset: 4.56, Scale:1
Band 3 Block=128x128 Type=Int16, ColorInterp=Blue
Description = B02
NoData Value=-32768
Offset: 10.11, Scale:7.89
Add a format option to set tiff metadata. Some of these are regular tiff tags, others will have to be encoded as gdal band metadata.
The options in geotrellis are limited, but there are very basic tifftools available in linux: https://linux.die.net/man/1/tiffset The idea is to be able to set metadata tags without touching the rest of the file, avoiding a full rewrite. Ideally this is done right after writing the tiff in geotrellis.
Format option will have to be passed through via save_result.