OSGeo / grass

GRASS GIS - free and open-source geospatial processing engine
https://grass.osgeo.org
Other
849 stars 308 forks source link

[Bug] Band references need a better name and definition #1868

Closed wenzeslaus closed 2 years ago

wenzeslaus commented 3 years ago

The issue

The term band reference was introduced with #63. There, perhaps, the word reference was referring to the fact that the text is referring to the band description stored in the file g.bands gives access to. However, now an arbitrary band reference can be stored for any raster map, i.e., it is more general than just bands from a predefined (or possibly in the future, registered) list of bands associated with particular sensors.

The band references can now be used (optionally assuming #1866) to label any rasters (raster maps or bands) in an imagery group so that signatures are associated with given set of bands rather than just a fixed set of raster data.

Going even further, the usage does not have to limited to image processing. For example, in modeling, standardized names such as topographic__elevation and sea_surface_air__temperature are used to describe inputs and outputs of models (see Landlab Standard Name Definitions and CSDMS Standard Names).

Overall, band references quickly outgrew their original definition and there is a potential to have good name suggesting the potential use. At the same time, there is a potential for ending up with a cryptic term which is difficult to explain and difficult to link to its application.

The issues with the current name are:

  1. The word band, although often used quite generally, might be tight too much to remote sensing and may overlap with other usages of the term.
  2. The word reference is the cryptic part. Name, label, band and id are words people may have different ideas about, but they generally be somewhat right. Reference, on the other hand, is a very special term, loaded for some, meaningless for others.
  3. The term band reference seems to be too long, resulting in modules and functions using different combinations of band and bandref in the interface instead of full band reference. Notably, bandref is not a word. The word band by itself may be too ambiguous. Interestingly, I don't think there is reference used as the only word, although that's really the noun in the term.

Since the current system is not part of any release yet, now is the time to get it right.

Naming options

Expected resolution

  1. We decide on a name for what these things are and even if it stays as band references we are sure that that's how we want to call them.
  2. The usage is consistent in documentation and in the interface (option names, function names).

Additional context

We discussed this during the last call, but decided to open this as an issue to discuss this in written form and asynchronously.

mlennert commented 3 years ago

Thanks, @wenzeslaus , for launching this.

My vote goes to the 'semantic' family as it seems the clearest about what it actually is about, with highest preference for 'semantic label'.

marisn commented 3 years ago

My vote goes to the 'semantic' family as it seems the clearest about what it actually is about, with highest preference for 'semantic label'.

But this suffers from the same length issue – how to shorten it as having a module parameter "semantic_label" is a bit verbose. Otherwise I also like the "semantic" idea.

wenzeslaus commented 3 years ago

Here is how semantic label would look like in action.

Interface of modules

r.support wrapper/i.band replacement:

r.semantic.label map=T33UVR_20180506T100031_B01 label=S2_1
r.semantic.label map=elevation label=topographic__elevation

Low-level r.support call:

r.support map=my_landuse semantic_label=CORINE_LULC

Same call in Python syntax:

gs.run_command("r.support", map="my_landuse", semantic_label="CORINE_LULC")

Examples of documentation

These are texts of examples from t.rast.mapcalc and t.rast.list rewritten using semantic label and filtering by semantic label.

Filtering by semantic label t.rast.mapcalc supports filtering by semantic label similarly to t.rast.list. In example below a new STRDS will be created and filled by NDVI products: ...For more information about semantic labels see g.bands module. ... Semantic label can be assigned to raster maps by r.semantic.label module or even when registering raster maps into STRDS by t.register...Name of STRDS can be extended by semantic label used for filtering. Name of STRDS and semantic label is split by a single dot...Note that filtering by semantic label is supported by all temporal modules...Also note that only STRDS can be filtered by semantic label, see r.semantic.label for current limitations.

Notably, I replaced all "band reference filtering" by "filtered by semantic label" and simplified several places which contained "band reference concept" and "band reference identifier" because "semantic label[s]" didn't seem to need that 3rd word there.

An abstract example would then look like:

# filter rasters in STRDS by semantic label
t.rast.list input=dataset_name.semantic_label

Pseudo-examples of future usage

Scenario 1 (not so far from the current usage): Creating semantic labels and then using them to access rasters in a imagery group instead specifying the raster name and the user made a typo in the word blue.

r.semantic.label map=ortho_r label=red
r.semantic.label map=ortho_g label=green
r.semantic.label map=ortho_b label=bluu  # typo here
i.group input=ortho_r,ortho_g,ortho_b group=ortho subgroup=ortho
d.rgb red=ortho.red green=ortho.green red=ortho.blue

This may give: ERROR: Imagery group <ortho> does not contain a band with semantic label <blue>.

Scenario 2 (futuristic): Passing wrong raster as elevation:

# works
r.watershed elevation=elevation1 slope=slope

# works
r.semantic.label map=elevation2 label=topographic__elevation
r.watershed elevation=elevation2 ...

# gives a warning
r.semantic.label map=elevation3 label=vegetation__dead_biomass
r.watershed elevation=elevation3 ...

This may give: WARNING: Raster <elevation3> has semantic label 'vegetation__dead_biomass' but elevation input is recommended to have quantity 'elevation', not 'dead_biomass' (or alternatively if the double underscore does separate object and quantity: ...is recommended to be none or one of elevation, topographic__elevation, land_surface__elevation).

veroandreo commented 3 years ago

I like semantic label and also the proposed name for the module starting with r since we can add labels to any raster maps.

wenzeslaus commented 3 years ago
mlennert commented 3 years ago

On 15/09/21 20:02, Māris Nartišs wrote:

My vote goes to the 'semantic' family as it seems the clearest about
what it actually is about, with highest preference for 'semantic label'.

But this suffers from the same length issue – how to shorten it as having a module parameter "semantic_label" is a bit verbose.

I don't have issues with long module parameters as you can shorten them in CLI usage (e.g. semlab=).

So the only ones suffering from long parameter names are those using the module in Python or other languages where this automatic interpretation of shortened parameter names does not exist. But this then comes back to a more general discussion of whether in programming clear variable names should prime over programmers laziness or not ;-)

marisn commented 3 years ago

But this then comes back to a more general discussion of whether in programming clear variable names should prime over programmers laziness or not ;-)

When a programming language with whitespace significance is in use (Python), variable and function name length is a legitimate concern and not only a preference.

Should I prepare a PR replacing "band reference" with "semantic label"?

ninsbl commented 3 years ago

While I have been sceptical regarding the need to rename band references (as those of you who were on the call may know), I now see the benefit of renaming. So consider me convinced (again).

The concept of semantic labels can use different refernces than known bands and link up to various standards, like e.g.: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

So in short, I also support the "semantic label" naming.

wenzeslaus commented 3 years ago

But this then comes back to a more general discussion of whether in programming clear variable names should prime over programmers laziness or not ;-)

When a programming language with whitespace significance is in use (Python), variable and function name length is a legitimate concern and not only a preference.

In Python, readable names are preferred. Readable may mean different things, but it usually defaults to long ones. Formatting practices like Black take long into account. Function name or parameter name plus value often ends up by itself on a line.

wenzeslaus commented 3 years ago

Should I prepare a PR replacing "band reference" with "semantic label"?

From my perspective, yes, please. That would be great.

I don't have a better name suggestion except for the option of using tag instead of label... and nobody got excited by grassness.

mlennert commented 3 years ago

As this should ideally be done before we branch out GRASS GIS 8, @marisn please go ahead with renaming.

landam commented 3 years ago

Let's start with PR changing API (grass.bandref -> grass.semanticlabel ?), modules and documentation. @marisn Does it fit to your plans? On the top of that we can create separated PR for TGRASS (I can do it).

marisn commented 3 years ago

Let's start with PR changing API (grass.bandref -> grass.semanticlabel ?), modules and documentation. @marisn Does it fit to your plans? On the top of that we can create separated PR for TGRASS (I can do it).

I started to work on a PR (#1928; I have problem finding free time lately). I am a bit stuck as I have no idea what to do with python BandReferenceReader, g.bands, i.bands. Suggestions welcome.

marisn commented 3 years ago

One of problems with BandReferenceReader, g.bands and i.band is that they where skipped in #1796 (along with updating documentation!) and thus still mandate <shortcut>_<band> form. They need a rewrite to remove unnecessary code.