CatalogueOfLife / coldp

32 stars 11 forks source link

Change the environment vocabulary #85

Open mdoering opened 1 month ago

mdoering commented 1 month ago

ColDP inherited 4 values for environment from Catalogue of Life:

Reading the concise Global Typology for Earth's Ecosystems they recognise 4 core realms which correspond to our environments. They do not include brackish - which I would also doubt is worthwhile tracking as its own environment. But they have an additional Subterranean category which makes a lot of sense and would complement our vocabulary nicely.

image

Most of their 4 realms have transitional realms which our brackish would be one of and which can be supplied in ColDP by given both values:

image

Suggest to add Subterranean and remove Brackish which is used in 16.145 taxa in COL out of 939.117 taxa that have some environment given. From those only 4.399 are considered brackish only:

               environments               | count  
------------------------------------------+--------
 {BRACKISH}                               |   4399
 {BRACKISH,FRESHWATER}                    |   1913
 {BRACKISH,FRESHWATER,MARINE}             |   2561
 {BRACKISH,FRESHWATER,MARINE,TERRESTRIAL} |    701
 {BRACKISH,FRESHWATER,TERRESTRIAL}        |    158
 {BRACKISH,MARINE}                        |   6107
 {BRACKISH,MARINE,TERRESTRIAL}            |    165
 {BRACKISH,TERRESTRIAL}                   |    141
 {FRESHWATER}                             |  65212
 {FRESHWATER,MARINE}                      |    928
 {FRESHWATER,MARINE,TERRESTRIAL}          |    431
 {FRESHWATER,TERRESTRIAL}                 |   9829
 {MARINE}                                 | 352722
 {MARINE,TERRESTRIAL}                     |    803
 {TERRESTRIAL}                            | 493047
mdoering commented 1 month ago

@dhobern @yroskov @thomasstjerne @CecSve @gdower @MattBlissett any opinion?

yroskov commented 1 month ago

It would be nice to adopt the entire Global Ecosystem Typology in the CoL. The Global Typology (with 3 top levels) looks very reasonable to me. It's a pity it wasn't there when CoL decided to include the Environment field in the Standard Dataset.

However, the proposal to move to a global ecosystem typology should be submitted for discussion by all GSD providers (as was done with the existing 4-point scheme). The voice of GSD is the basement. If they don't have suitable environmental data, the scheme itself in CoL is meaningless.

mjy commented 1 month ago

Should ENVO be a part of this consideration?

mdoering commented 1 month ago

Indeed it would be nice to have finer grained values, but I would not dare to introduce that at this stage. The proposal to remove brackish and add subterranean is a rather small change and as data above shows would not impact most GSDs. But I agree discussing with GSDs and potentially also users makes sense.

yroskov commented 1 month ago

As I remember, at the time of CoL discussions, WoRMS's(?) ecologists insist to have "brackish" as an unique environment, which cannot be expressed as "marine" + "freshwater" ("m + f" was my formalist proposal, which was rejected at that time).

If 4399 "brackish" species in the CoL are mostly from the WoRMS authors, we need to hear their views before taking further steps.

dhobern commented 1 month ago

I agree that the Global Ecosystem Typology looks great. We would need the 4 core and the 6 transitional realms (which would restore brackish in a more structured way). Individual datasets could go further, but the main function at the global level would be to slice taxa by the viable contexts in which they may be found, whether or not that is optimal habitat. We wouldn't want to flag every species that strays into Urban and industrial ecosystems as having that as part of its habitat.

mdoering commented 1 month ago

If COL adopted the entire typology we could always derive the 4 core realms from any more detailed values potentially given by a GSD, thus offering a consistent global view across the realms. We could even match the existing COL values to the topology to keep backwards compatability. It would not be a mandatory change for any GSD. When I think about it this appears like the better choice. We would adopt an existing, well defined standard but be backwards compatible for GSDs.

How/who can be best drive this forward?

CecSve commented 1 month ago

We have submitted a task group charter to the DwC maintenance group and are awaiting feedback still. The task group aim to include Realm and Biome terms (and environmentalMaterial) to DwC and the IUCN typology categories are the focus of a controlled vocabulary. We are also aligning the work with ENVO and the idea is to update ENVO with the IUCN categories as well.

It would be great if we eventually could index those fields and enable occurrence search on both realm and biomes.

mdoering commented 1 month ago

If the IUCN topology is the way to go and even ENVO would be adapted I would suggest to already adopt that vocabulary for COL/ColDP. It would not be breaking existing GSD data which we can map to the topology, but allow GSDs who want to use a richer vocabulary already to do so.