SeaBee-no / documentation

Repo for all SeaBee documentation
https://seabee-no.github.io/documentation/
0 stars 0 forks source link

Improve `seabeepy.geo.standardise_orthophoto` #52

Closed JamesSample closed 2 months ago

JamesSample commented 3 months ago

This function needs extending and improving:

  1. The NoData value should be set consistently across all output grids, regardless of the NoData value in the input grid. At present, the function just assigns a user-specified NoData value, but without adjusting/rescaling pixel values accordingly. This leads to issues e.g. bright white seabird plumage being classified as NoData, or genuine NoData values in Pix4D imagery appearing as black.
  2. It looks like the current function just extracts the first three bands and drops the rest, which is no good for MS data.
  3. The band order is not the same as previously agreed. Need to check what band order is produced by default by ODM and Pix4d. If they're the same, I suggest we stick with this. If not, we need to standardise.
  4. If the band order becomes non-standard, we need to update the publishing workflow on GeoNode so that RGB bands are displayed by default (i.e. the image renders as close to "normal" as possible).
JamesSample commented 2 months ago

Closed (tentatively) by this commit.

@knl88 for info.

This has turned out to be more fiddly than I hoped; SeaBee orthomosaics are more variable than expected. I haven't checked everything, but based on some manual exploration I have found the following:

1. Orthomosaics from ODM

From what I have seen, datasets produced automatically by ODM have complete metadata embedded in the GeoTIFF. This makes it possible to read band names etc. unambiguously, which is nice.

1.1. RGB datasets

For all datasets I have checked so far, RGB mosaics from ODM are 8-bit unsigned integer grids with four bands (RGBA). The data bands (RGB) use the full range of available values (0 to 255) and NoData is represented by the alpha channel, where 0 corresponds to NoData and 255 corresponds to Data.

However, NoData cells in the data bands are also assigned a value of 255. This causes problems in some software. For example ArcGIS Desktop fails to recognise the alpha channel correctly and instead displays NoData cells with values of (255, 255, 255), which is indistinguishable from valid cells in bright white areas. I am not sure whether this is due to ODM not writing the metadata correctly, or to ArcGIS not reading it correctly.

1.2. Multispectral datasets

The MS datasets from ODM that I have checked are 16-bit unsigned integer grids with 8 bands:

B, G, R, NIR, RedEdge, Panchromatic, LWIR, Alpha

As with the RGB grids, NoData is strictly represented by zeros in the alpha channel, but NoData cells are also assigned 65535 in the data bands, which confuses some software.

2. Orthomosaics from Pix4D

These seem more variable than the output from ODM and the embedded metadata is often missing or incomplete. I assume this is because these mosaics have often been created manually in e.g. ArcGIS using slightly different workflows. The general pattern is described below, but there are probably exceptions that I have not yet discovered that will need additional handling.

2.1. RGB datasets

As with ODM, these are usually 8-bit unsigned integer grids with bands ordered as RGBA. According to the metadata, NoData is encoded as values of zero in the alpha channel. NoData cells in the data bands are also usually assigned a value of zero. However, I have seen a few examples where values that should clearly be NoData are only represented as zeros in the data bands (i.e. they are classified as Data in the alpha channel). As above, I am not sure whether this is a bug in Pix4D or an issue with some aspect of the manual GIS workflow.

2.2. Multispectral datasets

Most MS datasets I have checked from Pix4D are 32-bit float grids with 5 bands

NIR, RedEdge, R, G, B

I suspect these grids have been created manually following the SeaBee specification, so the "raw" output from Pix4D (e.g. from earlier missions) might be different. These grids do not have an alpha band and the NoData value is undefined. However, it appears to be 0 in all bands.

3. Updated workflow

The code added in this commit does the following:

  1. Sets the default band order as

    (NIR)   (RedEdge)   Red   Green   Blue
  2. Attempts to read the band names and NoData values from the GeoTIFF metadata. If this is not available or incomplete, it "guesses" the following:

    • If the dataset has 4 bands, assume R, G, B, A.
    • If the dataset has 5 bands, assume NIR, RedEdge, R, G, B. Note that this assumption may be incorrect for some early Pix4D datasets.
  3. For Pix4D datasets, values of 0 in either the data bands or the alpha channel are interpreted as NoData. For datasets from ODM, values of zero in the data bands are first shifted to 1, and then cell values where the alpha channel is zero are set to zero.

  4. All datasets are converted to 8-bit unsigned integer grids (with value scaling), where the NoData values is zero and valid data ranges from 1 to 255. The output is either a 3-band (R, G, B) image or a 5-band (NIR, RedEdge, R, G, B) image. Any existing overview layers are discarded and rebuilt, and the file saved as a cloud-optimised GeoTIFF with LZW compression.


Hopefully this will handle most datasets sensibly. It should at least fix the annoying black or white NoData surrounds that appear on GeoNode for some of our missions.

Further work may be required to handle edge cases e.g. old Pix4D datasets, or images from new/uncommon sensors.

knl88 commented 2 months ago

Great explanation @JamesSample