Open theroggy opened 1 year ago
Hi, interesting point you make here.
It's indeed confusing that DataCube.dimension_labels
in Python client returns another DataCube
object, which makes little sense.
On the other hand, a DataCube
in the python client is not a concrete data cube but a wrapper around "virtual" process graph representation. The data cube only becomes concrete at the back-end side, when "executing" the virtual data cube (e.g. by starting a batch job from it). Likewise the DataCube
methods (in general) do not or can not return concrete data, like a concrete list of band names as you suggest.
That being said, a minimal improvement we can do here is not returning a DataCube
, but another class/object that makes clear it's a non-concrete/virtual wrapper for something that can be executed to, e.g. a list.
Another thing: if you really want to inspect e.g. the bands from a given DataCube
without first executing anything on the back-end, you can use the metadata
attribute of a data cube. E.g.
>>> cube.metadata.band_dimension
BandDimension(type='bands', name='bands', bands=[Band(name='VH', common_name=None, wavelength_um=None, aliases=None, gsd=None), Band(name='VV', common_name=None, wavelength_um=None, aliases=None, gsd=None)])
Strictly speaking I don't need the band names client-side. I want to loop over all bands to apply a specific linear_scale_range depending on the band name, but earlier on in the script I already add and/or rename some bands depending on some other logic, so it takes some extra code to keep track of the bands available in the cube.
Using the metadata object indeed seems like a nice workaround, or even a good solution. I applied it already in the sample code below:
for band in [band.name for band in cube.metadata.band_dimension.bands]:
if band.startswith("B02") or band.startswith("B03") or band.startswith("B04"):
band_rescaled = cube.filter_bands(band).linear_scale_range(0, 2000, 0, 255)
elif band.startswith("B08"):
band_rescaled = cube.filter_bands(band).linear_scale_range(0, 5000, 0, 255)
else:
raise ValueError(f"rescale not implemented for band {band}")
Normally I think it would also be possible to keep everything server side by using DataCube.dimension_labels("bands")
, and then possibly use text_begins
instead of .startswith
to do the band checks because this would return a Processbuilder (or something similar) rather than a real string array... but I don't think this is strictly better than your suggestion using metadata for my specific case?
Possibly there is a cleaner way of doing this, without using an explicit loop... but I don't know how.
That snippet looks fine at first sight. However, it's important to know what you do with the produced band_rescaled
values . I guess you left them out of the snippet to keep it compact. If you do something like merge_cubes
after this, that should be fine.
Normally I think it would also be possible to keep everything server side by using DataCube.dimension_labels("bands"), and then possibly use text_begins instead
Again it depends on what you want to do in detail, but I'm afraid this will be a dead end as openEO process graphs can not express everything we're used from languages like Python (there are no for-loops for example).
Possibly there is a cleaner way of doing this, without using an explicit loop... but I don't know how.
If it's just about applying a different scaling per group of bands as your snippet suggests, it can be done a bit more efficient as follows:
bands = [band.name for band in cube.metadata.band_dimension.bands]
bands2000 = [b for b in bands if any(b.startswith(p) for p in ["B02", "B03", "B04"])]
cube2000 = cube.filter_bands(bands2000).linear_scale_range(0, 2000, 0, 255)
bands5000 = [b for b in bands if any(b.startswith(p) for p in ["B08"])]
cube5000 = cube.filter_bands(bands5000).linear_scale_range(0, 5000, 0, 255)
scaled_cube = cube2000.merge_cubes(cube5000)
I would avoid an extra merge cubes, and go for an apply_dimension over bands. In apply_dimension, you will need indices, so you'll want to convert list of bands to list of indices and then perform scaling accordingly.
I indeed merge the rescaled bands back together with merge_cubes().
Thanks for the advice!!!
I wonder... I don't seem to find any documentation pointing to cube.metadata
. Is this work in progress, is it not really adviced to use it or didn't I search well enough?
I wonder... I don't seem to find any documentation pointing to cube.metadata. Is this work in progress, is it not really adviced to use it or didn't I search well enough?
it's a public attribute, and we won't remove/change it overnight, so you're free to use it.
It's indeed not documented at https://open-eo.github.io/openeo-python-client/api.html because, roughly speaking, it wasn't intended to be an abstraction level a normal user should interact with.
However, the use case discussed above, "get band names of a cube through cube.metadata.band_dimension()
", is a valid question, so we should indeed expose this in the docs
DataCube.dimension_labels()
now seems to return a DataCube in the python client: DataCube.dimension_labelsI think it should return an array of strings (e.g. the band names for dimension "bands"). According to the general process documentation this should also be the case: processes.dimension_labels