CatalogueOfLife / general

The Catalogue of Life
49 stars 5 forks source link

Extend CoL ranks #30

Closed mdoering closed 4 years ago

mdoering commented 7 years ago

The current CoL uses a fixed set of ranks that limits its use and misses some important groups. It is suggested to extend the ranks to cover the following, bold=existing CoL ranks:

kingdom phylum subphylum class subclass order suborder superfamily family subfamily tribe genus subgenus

It is not required to use all ranks in every group, but use tribes and subfamilies for example as appears useful within the respective group.

See also the following issues in GBIF as a background: http://dev.gbif.org/issues/browse/POR-2781 http://dev.gbif.org/issues/browse/POR-325

mjy commented 7 years ago

The NOMEN ontology includes URIs for ranks, scoped to individual codes, it would be great to use that resource as the core. It could be extended if needed.

M

On Oct 16, 2017 5:06 AM, "Markus Döring" notifications@github.com wrote:

The current CoL uses a fixed set of ranks that limits its use and misses some important groups. It is suggested to extend the ranks to cover the following, bold=existing CoL ranks:

kingdom phylum subphylum class subclass order suborder superfamily family subfamily genus subgenus tribe

It is not required to use all ranks in every group, but use tribes and subfamilies for example as appears useful within the respective group.

See also the following issues in GBIF as a background: http://dev.gbif.org/issues/browse/POR-2781 http://dev.gbif.org/issues/browse/POR-325

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sp2000/colplus/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/AABYSKG-VgOJo19W3opqZgOOAhqsMkocks5ssyqUgaJpZM4P6YWB .

mdoering commented 7 years ago

Thanks Matt. I would hope to use the same rank enumeration across codes though and not use URIs internally at least. A mapping to them would be nice I agree. There are only very, very few ranks that are conflicting between codes. Having them in one enumeration allows for much simpler comparison and ordering. Doing that for years at GBIF without troubles

mjy commented 7 years ago

It is indeed a trade-off, and IMO it mostly depends on what you want that "rank" to represent.

Is the rank a rank for a "natural" classification or a "nomenclatural" classification? If the latter you want NOMEN, if the former (which is likely the case since CoL is a list of species, not names), then much less of an issue.

If the former (nomenclatural) then these considerations should be taken into account:

M

On Mon, Oct 16, 2017 at 8:49 AM, Markus Döring notifications@github.com wrote:

Thanks Matt. I would hope to use the same rank enumeration across codes though and not use URIs internally at least. A mapping to them would be nice I agree. There are only very, very few ranks that are conflicting between codes. Having them in one enumeration allows for much simpler comparison and ordering. Doing that for years at GBIF without troubles

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Sp2000/colplus/issues/30#issuecomment-336892278, or mute the thread https://github.com/notifications/unsubscribe-auth/AABYSOXwwqUKOxpnbTpslh0qISG35O2Lks5ss17NgaJpZM4P6YWB .

mdoering commented 7 years ago

A nomenclaturalCode property in addition to the rank will tell you what code it adhers to and which rules & semantics to follow. So when it comes to exchange we should be able to link to a NOMEN URI. In an application which is not ontology or linked data based URIs are rather unwieldy.

In any case this issue is not about the technology or how to represent ranks. It is for the CoL team to decide which higher ranks it likes to see being populated in the management classification. Currently the set is pretty much restricted to the major Linnean ranks covered by dwc terms and I believe this is not sufficient

mjy commented 7 years ago

On Mon, Oct 16, 2017 at 9:24 AM, Markus Döring notifications@github.com wrote:

A nomenclaturalCode property in addition to the rank will tell you what code it adhers to and which rules & semantics to follow. So when it comes to exchange we should be able to link to a NOMEN URI. In an application which is not ontology or linked data based URIs are rather unwieldy.

I agree, there are many ways to implement a mapping that ultimately returns a URI.

In any case this issue is not about the technology or how to represent ranks. It is for the CoL team to decide which higher ranks it likes to see being populated in the management classification. Currently the set is pretty much restricted to the major Linnean ranks covered by dwc terms and I believe this is not sufficient

Well understood. BUT- I expect CoL to want to improve and understand their semantics at every possible decision process. I have seen this issue come up with every single data migration (obviously including this one), every single implementation of a taxonomic database, it always happens, it always leads to way more discussion than it should. The underlying issue is precisely an issue about how to represent ranks, so it seems like the right time to open the issue to the awareness of the CoL (likely continuing through some other venue).

If one agrees to use a unified representation of ranks, then this issue isn't even raised, you (software implementer) select/chose as you need ranks from the controlled vocabulary/ontology, knowing that whatever you pick the managing body has already OKed. Want to include some but not others, fine, just make sure they come from the standard. Any improvements to the system the get filtered back to the "standard" so that everybody else benefits (not just GBIF). So, what I'm saying is, CoL should bless/improve NOMEN, then you can pick and chose whatever you want in CoLPlus, but I"m a little biased ;).

M

ThierryBourgoin commented 7 years ago

Hi Markus, Matt,

Not sure the following is relevant but few remarks...

Yes ontology of ranks is probably already an issue, but in practice linking data to them would be probably also a nightmare if too many.

The fixed number of ranks particularly between families and genera was chosen to avoid sterile discussions issues about classification preferences (ranking) where, according to respective 'authorities’ or regional/local usages taxa1 would a tribe or a sub-tribe, for other a subfamily, etc. In the GBIF issue you send for instance with the example of my favorite group [ ;-) ] I don’t recognize Auchenorrhyncha as suborder and I keep separate Cicadomorpha and Fulgoromorpha as suborders because there is still a debate about Auchenorrhyncha - and I don’t think it will finish soon.

Apparently, the family rank appeared to be a good anchor point to stabilize the different advices. If we agree to add new family group ranks, we will have to face rather quickly the problem of managing multi-classifications. It will be necessary to provide a stable internal classification of reference (= management classification / not a public populated standardized classification!...) down to level tribes to structure and manage all changes.

I’m personally not against but it will rather complexify the task I think - so probably I will not recommend it or probably just a few of them at the most demanding usages. For instancer from what I can see in Entomology I would prefer to add the tribe (not in the list you provided Markus) than the subfamily which seems to me in general more stable and easy to ‘manipulate’/move in the classification - how these tribes are organized in subfamilies is the game area of many phylogeneticians and according the morphological characters, the genes and there number involved, the eventually addition of fossils with many missing data or the software and parameters they choose you will find different groupings… it will be quasi impossible to report/follow all what is published and to make everyone happy.

At the other side of the classification I foresee that at the supra-familial level (at least in Zoology) it will be also very complex if not worse particularly with the inclusion now of the fossils.

However I do think it is probably good and necessary to prepare the future and to give us the possibility to open easily the door when/where it will be necessary. So I share your idea Markus but in that perspective I would be rather prepared to allow any intermediate ranks whatever they have a proper level recognition (ontology) or not.

Was it the point you wanted to discuss Markus? ;-) Thierry

Le 16 oct. 2017 à 21:24, Matt notifications@github.com a écrit :

The NOMEN ontology includes URIs for ranks, scoped to individual codes, it would be great to use that resource as the core. It could be extended if needed.

M

On Oct 16, 2017 5:06 AM, "Markus Döring" notifications@github.com wrote:

The current CoL uses a fixed set of ranks that limits its use and misses some important groups. It is suggested to extend the ranks to cover the following, bold=existing CoL ranks:

kingdom phylum subphylum class subclass order suborder superfamily family subfamily genus subgenus tribe

It is not required to use all ranks in every group, but use tribes and subfamilies for example as appears useful within the respective group.

See also the following issues in GBIF as a background: http://dev.gbif.org/issues/browse/POR-2781 http://dev.gbif.org/issues/browse/POR-325

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sp2000/colplus/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/AABYSKG-VgOJo19W3opqZgOOAhqsMkocks5ssyqUgaJpZM4P6YWB .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sp2000/colplus/issues/30#issuecomment-336885433, or mute the thread https://github.com/notifications/unsubscribe-auth/AbKHG6YHtVngb3mJVn1Qm-O3pvDWwU_3ks5ss1kfgaJpZM4P6YWB.

mjy commented 7 years ago

To clarify slightly-

I'm not suggesting the CoL use any specific number of ranks, it could use 3, 10, 100. I'm suggesting that those ranks that are used can be cross referenced to a controlled vocabulary/standard.

Agreed, I'm pretty sure this is not the issue Markus wanted to raise :)

mdoering commented 7 years ago

Thanks Thierry, yes that is exactly the point I wanted to discuss! And I indeed have included tribe in my list above, but wrongly at the end. Fixed that now and I do think tribe, subfamily are the most important ranks to be added in some groups. Asteraceae for examples really benefits from tribes.

I also fully agree that the infrastructure should be able to deal with any number of ranks. I just wonder if we like to use the CoL and an extended/provisional catalogue to serve as a backbone it would be very useful to include at least names of key groups that people search on. Instead of adding them as fully recognized taxa in the hierarchy (where I share your concern of getting too deep into taxonomic debates) we have explored at GBIF to include them instead similar to synonyms with a special status and pointing to the next higher accepted "regular" rank. Subfamilies could be linked to the right family that way and searches will at least take you to the family instead of nowhere. This is especially important for occurrences in GBIF with even higher ranks like subphyla or subclasses when all we get is scientificName=Vertebrata or Radiolaria and we cannot place them anywhere. Even worse, Vertebrata is also used for 2 genera: https://www.gbif.org/species/search?q=Vertebrata

The question touches on what use cases the CoL wants to support. CoL+ does have a goal to also broaden names coverage in the provisional catalog, so I think we need to get the names in there at least.

rdmpage commented 7 years ago

@ThierryBourgoin I think there are some groups where additional ranks will be essential if CoL/GBIF is to be taken seriously for those groups, such as insects of medical important (see e.g., http://dev.gbif.org/issues/browse/POR-3182 ) From what little I've seen, anyone working with mosquitoes, biting midges, etc. routinely uses subgenera to help make sense of large, unwieldy genera. If you want to provide something of value to those communities it would make sense to be able to handle names in a way that they recognise as being useful.

mdoering commented 7 years ago

btw, see also #31 for the management classification

mdoering commented 7 years ago

Ruggiero et all for example do use more ranks for the classification:

ThierryBourgoin commented 7 years ago

Hi Rod, Agree. In some case it will be easy to say it is necessary in others… But at least for the first ones, yes it is necessary to provide this service. They are the only ones who know which level of classification will be pertinent for their analysis. So yes let us be more practical… Th.

Le 16 oct. 2017 à 23:35, Roderic Page notifications@github.com a écrit :

@ThierryBourgoin https://github.com/thierrybourgoin I think there are some groups where additional ranks will be essential if CoL/GBIF is to be taken seriously for those groups, such as insects of medical important (see e.g., http://dev.gbif.org/issues/browse/POR-3182 http://dev.gbif.org/issues/browse/POR-3182 ) From what little I've seen, anyone working with mosquitoes, biting midges, etc. routinely uses subgenera to help make sense of large, unwieldy genera. If you want to provide something of value to those communities it would make sense to be able to handle names in a way that they recognise as being useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sp2000/colplus/issues/30#issuecomment-336925685, or mute the thread https://github.com/notifications/unsubscribe-auth/AbKHG7816xG85rgTlBUBDayhBBzvytNgks5ss3fHgaJpZM4P6YWB.

mdoering commented 4 years ago

The use of additional ranks as needed has been approved by the Global Team. The exact ranks to be used is up to the editorial team of the respective group and can change within the entire catalogue