BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Taxon and TaxonName #309

Closed AlasdairGray closed 1 year ago

AlasdairGray commented 5 years ago

During the F2F there was a discussion as to whether Taxon and TaxonName are one and the same type or two separate types. This was to do with TaxonNames changing over time.

@qgroom can you add more details here about your idea for having separate Taxon and TaxonName types.

@frmichel did you consider this when you were designing the Taxon type?

qgroom commented 5 years ago

A scientific Latin name for an organism is an entity in of itself. It has an author, a protologue, a date of publication and one or more type specimens. It is not however a taxon. A taxon can have four legs, but doesn't have an author.

The rules for Latin name are details in the International Code of Nomenclature for Algae, Fungi, and Plants (ICN), the International Code of Zoological Nomenclature (ICZN) and the International Code of Nomenclature of Bacteria (ICNB). There are a number of databases that detail the published Latin names. However, these are not databases of taxa. Taxa have many synonyms and there is continual disagreement and flux in the accepted Latin name for a taxon.

The taxon is a scientific concept and the earliest valid name is chosen as a label for this concept. However, opinions differ as to what is the valid name is, and for many taxa it is not stable. In biological databases we link our data to what we think should be the accepted Latin name and we might also store synonyms and vernacular names. However, someone with a different opinion might use a different synonym as an accepted name. This happens all the time.

So we have the situation were a specimen can be the nomenclatural type specimen of a valid Latin name, but the accepted Latin name of that specimen is another name.

qgroom commented 5 years ago

@nickynicolson, what do you think about using bioschemas for IPNI?

stylesm commented 5 years ago

My question would be - is there an authority for taxons?

If so, a Taxon type could have a URL to an identifier, which can change over time.

If so, this makes referencing taxonomies a lot easier, because instead of hard-coding TaxonNames or even just String values for taxons into schema markup, they could reference the relevant Taxon entity, and if that changes in future, that is all handled by the third party taxon authority.

qgroom commented 5 years ago

Names can have identifiers more easily than taxa. Various databases put identifiers on taxa, but it is difficult, as it is like trying to put an identifier on a religion. The authority for a taxon would be something like a Flora or monograph where a name is accepted for a taxon and the taxon is defined, so that it is clearly circumscribed. However, there are many Floras and monographs and they often conflict in their interpretation of the taxon. The identifier for a taxon is the Latin name, adding another identifier just adds to the synonymy that already exists. This issue comes up a lot in biodiversity informatics and there is no easy solution.

frmichel commented 5 years ago

Hi guys,

Sorry for not responding earlier, I'm actually at the Web Conference in San Francisco.

So, the short answer is, as @qgroom has explained, that Taxon and TaxonName are not one and the same type.

There are endless discussions in the biodiversity community about the exact definition of terms such as taxon, taxon concept, taxon name, taxon name usage (see https://github.com/tdwg/tnc/issues/1).

I think we all agree about the fact that Bioschemas should not get into such expert's debates, but instead should remain at a general level where there is consensus. We aim at defining terms to mark up web pages, we do not aim at producing a rich domain ontology.

In this perspective, whether we need a TaxonName type depends on foreseen use cases. The typical use case I can think of is about museums of Natural History annotating web pages about the different species they have in their collections. Hence taxa, not taxonomic names. This is why the proposition I made was only about a Taxon type and profile. Yet, as @qgroom mentioned, some databases (and thus probably portals thereof) provide taxon names. In this case, the TaxonName type is required too.

So we can go for the solution with two terms. This is sufficient to cover a broad range of use cases, and simple enough to avoid hurting domain experts (hopefully!). If the group agrees on this, I can start updating the taxon example using a TaxonName type.

Franck.

qgroom commented 5 years ago

I'd be happiest with both terms, but for simplicity, if only one of these terms is used, it should be TaxonName, rather than Taxon. In some cases the taxon name is used as a label for the taxon, in other cases it is actually referring to the name, but this can't work the other way around. A name can signify a taxon, but a taxon can't signify a name. I am Quentin, but the name Quentin is not me.

frmichel commented 5 years ago

Hi all,

Here is a proposition for a TaxonName type, an how it may relate to the Taxon type.

  1. TaxonName : instead of importing DwC properties, we may only use common schema.org properties: name (the scientific name, without author nor date), author (authorship information if known), + an optional taxonRank property that we have already defined for Taxon. Should there be anything else?

  2. In the Taxon type, I think we need to create 2 new properties to relate the taxon to its accepted/valid name and its synonyms: we can't use schema:name or schema:alternateName that work only for strings. Same thing for dwc:scientificName. So I propose scientificName (or e.g. referenceName) to denote the accepted/valid name, and synonym (or e.g. synonymName, synonymScientiicName).

  3. For the sake of simplicity/usability, perhaps we should also allow the markup to describe a Taxon with literals only, using name and alternateName. Both descriptions are not exclusive.

Below is the existing beluga example that I updated with this proposition (I leave apart the end of the example about images, vernacular names, parent taxon etc.

{
    "@context": [
        "http://schema.org",
        {
            "dwc": "http://rs.tdwg.org/dwc/terms/",
            "dwc:vernacularName": { "@container": "@language" }
        }
    ],

    "@type" : "Taxon",
    "additionalType": [ "dwc:Taxon", "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept" ],
    "identifier": "60932",
    "mainEntityOfPage": "https://inpn.mnhn.fr/espece/cd_nom/60932?lg=en",

    ### The simple version ###
    "name": "Delphinapterus leucas (Pallas, 1776)",
    "alternateName": [
        "Balaena albicans Muller, 1776",
        "Beluga catodon Gray, 1846"
    ],

    ### The rich version ###
    "scientificName": {
        "@type" : "TaxonName",
        "name": "Delphinapterus leucas",
        "author": "(Pallas, 1776)",
        "taxonRank": "species"
    },

    "synonym": [
        {   "@type" : "TaxonName",
            "name": "Balaena albicans",
            "author": "Muller, 1776"
        },
        {   "@type" : "TaxonName",
            "name": "Beluga catodon",
            "author": "Gray, 1846"
        }
    ],
   ...
}

What do you think? All suggestions/propositions welcome.

Franck.

yvanlebras commented 5 years ago

Very interesting discussion. I totally agree with "Bioschemas should not get into such expert's debates, but instead should remain at a general level where there is consensus. We aim at defining terms to mark up web pages, we do not aim at producing a rich domain ontology." And looking at your proposals, it seems to me the rich version of beluga example is better. Maybe we have also to pay attention to 'taxonRank' as there can also have divergence here (not at the species level hopefully)

qgroom commented 5 years ago

I too prefer the rich version. Not being a bioschemas expert I have questions, but I can see from his responses that @frmichel understands the issues, so if you say this will work then I'm happy to run with it.

frmichel commented 5 years ago

Not being a bioschemas expert I have questions, but I can see from his responses that @frmichel understands the issues, so if you say this will work then I'm happy to run with it.

@qgroom, well I'm a computer guy, not a biodiversity guy. So please do not hesitate to go ahead with your questions, I would feel safer if you guys question this proposition! ;)

Maybe we have also to pay attention to 'taxonRank' as there can also have divergence here (not at the species level hopefully)

@yvanlebras, can you be more specific about this issue?

qgroom commented 5 years ago

...please do not hesitate to go ahead with your questions, I would feel safer if you guys question this proposition! ;)

My questions just relate to how this will work in practice and what applications might be built on it. So far I've only seen Google Data Search as a place that Schemas are used. How much of what we suggest depends on the potential applications that are built on them?

That's a rather vague question I know.

As long as the core concepts are right in Bioschemas and the implementation can stay quite adaptable I'm inclined to go with the shoot-first-ask-questions-later approach. There are other initiatives taking a more detailed approach to data modelling.

Maybe we have also to pay attention to 'taxonRank' as there can also have divergence here (not at the species level hopefully)

I suspect it is needed. Sometimes it is determinable from the name, but not always. It is very simple information to provide so should not be a block to use of Taxon.

The controlled vocabulary for taxonRank is something to consider. We should stick to ranks in the International Codes for Nomenclature e.g. https://www.iapt-taxon.org/nomen/pages/main/art_4.html

GBIF has a workable controlled vocab here http://rs.gbif.org/vocabulary/gbif/rank.xml

Can controlled vocabularies be defined in a schemas?

yvanlebras commented 5 years ago

To go further, as it appears to me that even scientific latin names are revised (if I well understood, sometimes latin conjugating (not sure this is english sorry) has to be corrected..) maybe a good approach is to give possibility to specify which taxonomic reference we are using (ncbi taxon ? Worms? Gbif ? French taxref....) then maybe provide a mapping between these references ...

qgroom commented 5 years ago

Accepted names change for all sorts of reasons. Sometimes it is just to correct spelling/grammer. Sometimes it is the taxonomic priority of one name over the other. Sometimes new evidence moves the circumscription of a species or genus (merges and splits). These changes are sometimes clear, but often the opinion of whether a taxon should be split or merged is just down to taxonomic opinion. There is sometimes no "right" answer, just an opinion of where the boundaries lie.

maybe a good approach is to give possibility to specify which taxonomic reference we are using (ncbi taxon ? Worms? Gbif ? French taxref....) then maybe provide a mapping between these references ...

So yes, it is a good idea to say who was the person(s) who accepted a name as the right one. However, the aggregated datasets you mention are not usually the original source of these opinions. In a Schemas situation is it necessary, because by definition the person(s) accepting the name is the the website creator? The original taxonomic opinion probably comes from a scientific paper or book. Isn't there already a way to link a statement to a citation within Schemas?

yvanlebras commented 5 years ago

Not sure to fully understand but I think I am partially ok with this. But you don't think that in bioschemas a taxon markup can't be associated to a "system" where the taxon/taxonName is detailled? And maybe such a system can give the original scientific reference... But maybe I'm wrong....

qgroom commented 5 years ago

This is where you get into the distinction between the taxon and the taxon name. The literature and characteristics of a taxon are separate from the literature and characteristics of a name.

To give a practical example: I recently published this paper https://doi.org/10.3897/phytokeys.119.33280. Essentially it is nothing about the taxon we call Oxalis bowiei, but about the name and the type specimen we associate with the name Oxalis bowiei W.T.Aiton ex G.Don. If we try and tie name information too closely to taxon information it gets quite difficult to understand what refers to what.

It comes down to a taxon name being a fixed entity and a taxon being a more fluid hypothesis. For the small number of organisms we eat and uses as models the name and the taxon are largely inseparable in the way we use them, but for the vast majority of organisms the situation is more fluid.

frmichel commented 5 years ago

Hey guys, thanks for the stimulating discussion! let me try to summarize.

About taxonRank using a controlled vocabulary:

The current definition we have proposed in the taxon type says "The taxonomic rank of this taxon given preferably as a URI from a controlled vocabulary – (typically the ranks from TDWG TaxonRank ontology or equivalent Wikidata URIs).".

Possible values can be a URL, string or PropertyValue. In the example I used several of them:

     "taxonRank": [
      "http://rs.tdwg.org/ontology/voc/TaxonRank#Species",
      "http://www.wikidata.org/entity/Q7432",
      "species"
    ]

I don't believe that the exhaustive list should be defined as additional terms in schemas.org. The whole idea of Bioschemas is precisely to reuse what exists from well adopted vocabularies instead of recreating everything. Besides, here too, everybody does not always agree: intermediate ranks like sub- and sup- appear in some classifications but not in others. So, putting it this way leaves room for flexibility.

Plus, the taxon profile may recommend a preferred controlled vocabulary. Although I must admit that, along with discussions and changes, the distinction between the Taxon type and the Taxon profile becomes thinner and thinner. The profile now essentially adds marginality for each property (minimum = mandatory, recommended, optional).

About names vs. taxa

My understanding is that names do NOT change. They are coined once for all in a publication, with a taxonomic rank, and won't change, ever. Only the way taxa "use" them will change along with recompositions, merges, splits etc. Hence the need to denote not only the accepted/valid name but also the synonyms. I'm not sure if this is fully sufficient, but denoting both the accepted/valid name and synonyms gives a pretty good idea of the opinion of the author about the taxon circumscription.

[Quentin]: it is a good idea to say who was the person(s) who accepted a name as the right one. (...). In a Schemas situation is it necessary, because by definition the person(s) accepting the name is the the website creator? The original taxonomic opinion probably comes from a scientific paper or book. Isn't there already a way to link a statement to a citation within Schemas?

Good point. Indeed, linking to this information, if known, would be interesting. In Taxon type, we could use description and disambiguationDescription for this purpose. These are simple text strings but better than nothing. More formally, we could use the isBasedOn property to link to a ScholarlyArticle (CreativeWork > Article > ScholarlyArticle).

[Yvan]: But you don't think that in bioschemas a taxon markup can't be associated to a "system" where the taxon/taxonName is detailled? And maybe such a system can give the original scientific reference.

Tough one. Like Quentin wrote, GBIF, Worms and others are "just" aggregators, they do not make a choice, they just report what is said by others. But I know that TAXREF keeps tracks of the publication upon which the proposed opinion is based. Here typically we could use the isBasedOn property as I wrote above.

About applications exploiting the markup

[Quentin] My questions just relate to how this will work in practice and what applications might be built on it. So far I've only seen Google Data Search as a place that Schemas are used. How much of what we suggest depends on the potential applications that are built on them?

That's are general question on Bioschemas in fact. @qgroom you may have a look at the examples in the Deploy page. Regarding specifically the Taxon/TaxonName terms, I have no answer yet: I plant the egg hoping for chickens to "emerge" eventually. Google Dataset Search specifically exploits the Dataset term. In the future, I foresee use cases where it would be an entry point for applications willing to discover datasets (related to a certain taxon for instance), and later on do some stuff with them. Not very concrete I admit...

frmichel commented 5 years ago

Hey guys,

I've created a Taxon draft profile 0.5 together with examples to address the issues we have discussed here. I describe the changes below, they are exemplified in Delphinapterus leucas_jsonld_0.5_full.json.

Don't hesitate to comment here and/or directly update the profile specification.

Identifier from authority databases

To answer one of Carl Boettiger's remarks, I propose to use a PropertyValue to link the taxon to its equivalent identifiers within other authority databases.

There are many properties like that in Wikidata: GBIF id, EOL id, TAXREF id etc. I've listed some of them in the specification.

Publication asserting the taxon circumscription/hypothesis

I've added a Bioschemas description for disambiguatingDescription (text value only): "Can be used to specify the taxon circumscription retained in this description"

Also, as I suggested above, I've added the schemas.org isBasedOn property to link to a CreativeWork: "A CreativeWork, such as a scholarly article, asserting the status of the accepted/valid name and synonyms, retained for the taxon circumscription".

TaxonName

For now, we have submitted a Taxon type for inclusion in schema.org, where the accepted/valid name and synonyms are denoted as text using properties name and alternateName.

I assume that, in the next round, we shall submit the TaxonName type together with two new properties to link a taxon to its names: scientificName and alternateScientificName (the counterparts of name and alternateName). I also created a new TaxonName profile specification based on the TaxonName type, we shall also discuss the details of this one... eventually.

I suggest that name should still be mandatory in the Taxon profile (even though it may be accompanied by a scientificName); scientificName is recommended, alternateName and alternateScientificName are optional.

Franck.

qgroom commented 5 years ago

the word 'alternate' is only synonymous with 'alternative' in North Americain English. In British English it doesn't make sence, because we only use it in the sence of alternating. I suggest using 'alternative'.

qgroom commented 5 years ago

In the TaxonName profile the requirement for the name to be accepted or valid should be dropped. Many synonyms are either not accepted and/or invalid.

frmichel commented 5 years ago

the word 'alternate' is only synonymous with 'alternative' in North Americain English. In British English it doesn't make sence, because we only use it in the sence of alternating. I suggest using 'alternative'.

Didn't know that. I used the word "alternate" to mimic what already exists in http://Schema.org/Thing, which was most likely proposed by English speakers from North America. Nevertheless, I feel like t would look somewhat inconsistent to use alternateName on one side and alternativeScientificName on the other. Don't you agree?

frmichel commented 5 years ago

In the TaxonName profile the requirement for the name to be accepted or valid should be dropped. Many synonyms are either not accepted and/or invalid.

That's, an erroneous copy/paste. I fixed it in the name property description, and added this precision in the "Specification info" tab too.

AlasdairGray commented 5 years ago

the word 'alternate' is only synonymous with 'alternative' in North Americain English. In British English it doesn't make sence, because we only use it in the sence of alternating. I suggest using 'alternative'.

Didn't know that. I used the word "alternate" to mimic what already exists in http://Schema.org/Thing, which was most likely proposed by English speakers from North America. Nevertheless, I feel like t would look somewhat inconsistent to use alternateName on one side and alternativeScientificName on the other. Don't you agree?

I agree that we should be consistent with the existing schema.org naming convention.

qgroom commented 5 years ago

Yes, I'll just have to live with it.

frmichel commented 4 years ago

Hi all,

It's been 6 months since we had this discussion. Maybe it's time to move forward.

As a first, step, we may now generate the draft pages: @AlasdairGray, can you remind me how to create the following pages?

Also, @AlasdairGray, I'm not sure yet when the new Taxon term will be endorsed by schema.org, but if we expect it is still gonna take quite some time, what about trying to move directly to the new couple (Taxon, TaxonName)?

Also, the current Taxon profile is v0.3 which is one version behind the current Taxon type v0.3_RC. The current profile should now be v0.4

Franck.

qgroom commented 4 years ago

Hi @frmichel , I'm also keen to get this progressed. Let me know if there is anything I can do? Quentin

AlasdairGray commented 4 years ago

It's been 6 months since we had this discussion. Maybe it's time to move forward.

Also, @AlasdairGray, I'm not sure yet when the new Taxon term will be endorsed by schema.org, but if we expect it is still gonna take quite some time, what about trying to move directly to the new couple (Taxon, TaxonName)?

We should move this work forward, particularly if this will lead to more use of the markup. This will help strengthen the case for their inclusion in Schema.org

As a first, step, we may now generate the draft pages: @AlasdairGray, can you remind me how to create the following pages?

new type and profile pages for the TaxonName specification Taxon type 0.4 draft Taxon profile 0.5 draft

The approach is different for the type and profile. The management group have been writing up and formalising the processes that we have been following. Hopefully these instructions can be followed to create the next draft of the profile.

For the type, it is probably easiest if you provide the definitions here and I'll update the appropriate files.

Also, the current Taxon profile is v0.3 which is one version behind the current Taxon type v0.3_RC. The current profile should now be v0.4

It would definitely be good to update the latest release. Again the management group are formalising this process. As such, we would like to see at least two live deployments of the revised profile so that we can ensure that it is implementable.

frmichel commented 4 years ago

On Wed, Feb 12, 2020 at 5:18 PM Carl Boettiger wrote:

In the spirit of demonstrating adoption, I think it would be great if the recommendation reflected greater alignment with existing namespaces that are widely used in taxonomy, such as Darwin Core, https://dwc.tdwg.org/terms/#taxon .

I think this would greatly facilitate adoption. For instance, the current specification provides no mechanism to disambiguate synonyms (https://dwc.tdwg.org/terms/#dwc:taxonomicStatus, https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID) or taxonomic concepts. I'm also unclear on the utility of childTaxon and hasDefinedTerm in the current bioschemas spec. Apologies if I've missed the boat on these discussions already, but these are certainly barriers to me in using bioschemas over an existing namespace like Darwin Core. (Also cc'ing Rob Guralnick on this who has far more expertise than I in this area and could speak more broadly to the potential for adoption of https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/).

(...) identifiers are of course the solution, the point is that you need two different identifiers and you need to know which is which. Here's a quick DarwinCore example:

{
"taxonID": "ITIS:1000254",
"scientificName": "Rollandia micropterum",
"acceptedNameUsageID": "ITIS:562791",
"taxonomicStatus": "synonym",
"vernacularName": "Titicaca Grebe"
}

We don't need taxonomicStatus explicitly here, since it is implied by seeing that the accepted ID (acceptedNameUsageID) is not the same thing as the taxonID for this name. But we do need two identifiers, and we need to know which one is which. It's not clear to me how the above would be represented in the schema.org proposal. (of course one could say "don't use synonyms! but we may as well then say "don't use scientific names, just use accepted identifiers" but we live in a world that uses scientific names so we need these mechanism that can acknowledge some names are synonyms)

frmichel commented 4 years ago

On Fri, Feb 14, 2020 at 6:32 Franck Michel wrote:

Dear Carl, Leyla (+ Quentin who shall certainly be interested in this),

I agree that we should do an effort to better explain how the current recommendation aligns with existing vocabularies, specifically Darwin Core.

I'll try to describe how we can solve that. I'm sorry this email is pretty long, but I don't know how to be clear and short at the same time ;)

There have been quite some discussions in the beginning wrt. what the Taxon term shall refer to: a taxon concept? A taxon name usage? etc. Even experts do not always agree on the definition of those terms. So we agreed on two principles:

A taxon (instance of type Taxon) is associated with an accepted (or valid) name (schema:name), 0 to any number of synonyms (schema:alternateName), and identifiers from other DBs:

"@type" : "Taxon",
"additionalType": [ "dwc:Taxon", "http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept" ],
"name": "Delphinapterus leucas (Pallas, 1776)",
"alternateName": [ "Balaena albicans Muller, 1776", "Beluga catodon Gray, 1846" ],
"identifier": [
    {   "@type": "PropertyValue",
        "name": "WoRMS id",
        "propertyID": "https://www.wikidata.org/entity/P850",
        "value": "137115"
    }
]

In further discussions, we agreed that modelling only taxa was not sufficient as some databases/portals describe scientific names, not taxa. So we started defining the TaxonName term (which is not yet published on the web site, but I'm on it...). This term allows to give more specific information about a name. Hence the creation of two new properties schema:scientificName and schema:alternateScientificName which are the counterparts of schema:name and schema:alternateName, but with an object of type TaxonName insead of a string. One would typically use either one couple of properties or the other, but they might be used simultaneously though:

"name": "Delphinapterus leucas (Pallas, 1776)",
"alternateName": [ "Balaena albicans Muller, 1776" ]

"scientificName": {
    "@type" : "TaxonName",
    "name": "Delphinapterus leucas",
    "author": "(Pallas, 1776)"
},
"alternateScientificName": [
    {   "@type" : "TaxonName",
        "name": "Balaena albicans",
        "author": "Muller, 1776"
    }
]

Now, how does this compare with Darwin Core? The pb is that Darwin Core RDF terms describe names and names usages, not taxa. In the example you provide: { "taxonID": "ITIS:1000254", "scientificName": "Rollandia micropterum", "acceptedNameUsageID": "ITIS:562791", "taxonomicStatus": "synonym", "vernacularName": "Titicaca Grebe" }

"ITIS:1000254" actually represents a taxon's name which happens to be a synonym of "ITIS:562791", therefore the need for acceptedNameUsageID and taxonomicStatus. With the Taxon and TaxonName terms, we could write the same thing by first denoting a Taxon with an accepted name (scientificName) and a synonym (alternateScientificName), like this:

"@type" : "Taxon",
"scientificName": {
    "@type" : "TaxonName",
    "identifier": {
        "@type": "PropertyValue",
        "name": "ITIS id",
        "value": "562791"
    }
},
"alternateScientificName": [
    {   "@type" : "TaxonName",
        "name" : "Rollandia micropterum",
        "identifier": {
            "@type": "PropertyValue",
            "name": "ITIS id",
            "value": "1000254"
        }
    }
]

Still, this seems a bit cumbersome since you just want to represent names but you have to denote a Taxon. So, one option could be to have a new set of properties hasSynonym/synonymOf to only denote relationships between TaxonName's instances:

"@type" : "TaxonName",
"name" : "Rollandia micropterum",
"identifier": {
    "@type": "PropertyValue",
    "name": "ITIS id",
    "value": "1000254"
}
"synonymOf": {
    "@type" : "TaxonName",
    "identifier": {
        "@type": "PropertyValue",
        "name": "ITIS id",
        "value": "562791"
}

What do you think? Would that work for you?

Franck.

frmichel commented 4 years ago

On Fri, 14 Feb 2020 at 18:37, Carl Boettiger wrote:

Hi Franck,

Thanks for the detailed reply and please let me know if we should move this discussion over to a GitHub Issue? Apologies I wasn't up to speed on the more recent discussions than what is on the bioschemas website.

I'm have reviewed the threads you link and I very much share the sentiments and objectives you have all voiced there and in this thread (avoid the debates, leverage existing schema.org vocab whenever possible). Unfortunately, I'm afraid the new proposals sound quite confusing. It seems the proposal to create a new TaxonName implicitly means that Taxon is supposed to effectively mean "TaxonConcept"? I agree TaxonConcept is not an area of consensus, and it's main purpose is to allow for discussion in a world where different authorities have conflicting/overlapping notions of TaxonConcept, and I'm really not sure we want to go that route.

If Taxon is not meant as "the concept of taxon" then I don't see how it is different from a TaxonName. (This is made even more confusing by the fact that "name" is also a Property of a taxon). I think this new proposal is much more confusing than the original! I acknowledge that the "Concept" of a Taxon is different than a name, but I think we would be better off not attempting to define a class/Type for "TaxonConcept" (since afik the experts haven't done that), and we should let the proposal of "@type": "schema:Taxon" mean a name, which is how most people see it. (At it simplest, we should think of "Taxon" as merely a name/label we apply to an individual specimen, and not worry about defining the 'class of all such specimens).

Defining the inverse pair hasSynonym & synonymOf sounds reasonable, though I do worry a bit about the complexity. That is, taxonomically, hasSynonym implies it is property of an "accepted name", while synonymOf sounds like a property of "the synonym", but in English "synonyms" are symmetric, there's no "accepted" one. I wonder if (paralleling the darwin core terms) it would be better to use the optional property "acceptedName" (and not define an inverse property).

  "@type" : "Taxon",
    "name" : "Rollandia micropterum",
    "@id": "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=1000254"
    "acceptedName": {
                      "@type": "Taxon",
                      "name": "Rollandia microptera",
                      "@id": "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=562791"
                    }

Does that make sense?

Apologies, not trying to open a can of worms here, just aspiring to the same goals of avoiding debate and re-using existing terms!

frmichel commented 4 years ago

On Sat, Feb 15, 2020 at 4:15 AM Quentin Groom wrote:

Hi Carl, Franck, Alasdair and all,

at least for me, the taxonName term was created to support findability for taxonomic names registries, such as Zoobank, Mycobank and IPNI. As these databases do not keep track of taxa they would be poorly supported by the use of a taxon term in place of a taxonName term. Having said that, I would avoid modelling biological taxonomy and nomenclature in bioschemas, because it's quite a minefield. Therefore, I would keep the relationship between taxon and taxonName as simple as possible. It should be simple enough to support finability of resources on the internet, but it is never going to be rich enough to support an understanding of the nuances of taxonomic concepts and their interrelationships with taxonNames.

For me, one would use taxonName when your data relates to the publication and typification of a name, but use taxon when your data is primarily about the traits of the taxon and other biological features. Clearly, there are overlaps. I particularly see either option being useful for specimens, but again it depends on the use case.

I'm not sure if this helps the discuss, but that's my 2 cents worth.

Quentin

frmichel commented 4 years ago

On Sat, Feb 15, 2020 at 16:37 Matt Yoder wrote:

Hi all,

Just diving into this discussion so my apologies if I'm rehashing things that have been worked out (I'm certain I am), please ignore if so.

What I see from the outset are needs that conflict, sometimes significantly. These fall into two categories as Quentin and others noted: 1) compatibility, i.e. things need to work with concepts that have existed and been implemented and 2) clarification, i.e.the ability to use terms consistently, and therefor comparably in a meaningful way. I suggest that anything that emerges from this effort be (strongly) biased to 2, even at the partial or significant cost to 1. I fear that terms that support the confusion between name and concept (which isn't that difficult if you step back) are going to keep our efforts blurred, and interoperability unresolved. I'm seeing precisely this happen in large ongoing efforts that I won't name. Users of terms, importantly (but far from exclusively) the technical teams that implement databases, tools etc. need to work to stop blurring the lines, to get there is going to be a long slow educational process, but decisions by parties like this one can help get us there.

I know of no system that yet currently handles the semantics perfectly (this may be impossible), but I do know several ideas are emerging/have emerged:

1) If your data model does not distinguish names from concepts, your system is going to whir OK for a while, then see serious problems that frustrate everybody, internal and external. These can be problems as simple as trying to keep track of what software code in your system does what (in fact this is our prime reason for keeping the two separate in our group's efforts). 2) There is "synonym" and there is nomenclatural synonymy.. Trying to dance between the two is going to cause problems as in 1).. We've created NOMEN (https://github.com/SpeciesFileGroup/nomen) to let us isolate and handle the later. It is OK for only taxonomists to know about nomenclatural synonymy and its nuances, not everybody has to know everything. We've buried the complexities of using NOMEN in interfaces that taxonomists understand. 3) Systems that require nomenclature before concepts can be instantiated are going to fail. For example, users need to capture data about undescribed taxa, and not everyone wants/needs to understand nomenclature. 4) Using new terms, even if foreign, can help people begin to understand the distinction between names and concepts. We use "Otu" for taxon concept and "TaxonName" for taxon name.. This term has historical baggage, but curators/scientists get our new use with very little explanation. Do not fear injecting new terms into the world!!! 5) Practically, when describing the difference between TaxonName and Otu we ask people to run through a little test:

Just my 2c as well!

Cheers, Matt

cboettig commented 4 years ago

at least for me, the taxonName term was created to support findability for taxonomic names registries, such as Zoobank, Mycobank and IPNI. ...

@qgroom yes, totally see / share that use case. but wouldn't it be sufficient to let the "name" attribute of Taxon be the taxonName, and add an identifier attribute using the identifier of one of those registries (ITIS in my example being such a registry), if the user wishes to indicate that the name use conforms to a particular registry?

For me, one would use taxonName when your data relates to the publication and typification of a name, but use taxon when your data is primarily about the traits of the taxon and other biological features.

Aye, there's the rub. Of course traits and biological features are features of specimens, not of taxa, and yet as scientists we ignore this all the time. I'm not wild about supporting this use case. (A taxon can have an 'average body mass', but it doesn't have "a body mass". Though sometimes we associate even more meta concepts to taxa that aren't even properties of taxa alone -- like R0 for a disease, which also reflects the human behavioral context).

frmichel commented 4 years ago

@cboettig: wouldn't it be sufficient to let the "name" attribute of Taxon be the taxonName, and add an identifier attribute using the identifier of one of those registries (ITIS in my example being such a registry), if the user wishes to indicate that the name use conforms to a particular registry?

In this case, the problem is that the ID should be attached to the name, not the taxon. But we cannot do that if the name is just a string (schema:name), it has to be an object of its own, hence the example:

"@type" : "TaxonName",
"name" : "Rollandia micropterum",
"identifier": {
    "@type": "PropertyValue",
    "name": "ITIS id",
    "value": "1000254"
}

@cboettig: Of course traits and biological features are features of specimens, not of taxa, and yet as scientists we ignore this all the time. I'm not wild about supporting this use case. (A taxon can have an 'average body mass', but it doesn't have "a body mass". Though sometimes we associate even more meta concepts to taxa that aren't even properties of taxa alone

You are right, and this is probably where we should draw the "red line" not to cross in Bioschemas.

The whole problem is to figure out where that line lies, between a simple/usable vs. rich/accurate vocabulary. I agree that distinguishing between Taxon and TaxonName may not keep things as simple as could be. Indeed, as far as I understand, Taxon in this case closely maps to a taxon concept. Still, I feel like we do not get into subtleties where there would not be consensus. In other words, I'd say that this distinction remains on the simple/usable side of the line.

Plus, terms shall be usable by a spectrum of people with very different backgrounds. When people don't really know if they are dealing with a name or a taxon, I'd say that they most likely deal with a taxon. In that case, they should use the Taxon term which is flexible enough and can be denoted without a TaxonName. On the other hand, people dealing specifically with names (e.g. Zoobank, Mycobank and IPNI like @qgroom said) probably understand pretty well what they are doing, and those ones would use TaxonName without hesitation.

Like Matt Yoder explained (If I understood correctly), trying to be compatible with DwC (only a Taxon term) bears the risk of maintaining confusion and thus hamper future data integration. So if we go for 2 separate terms, maybe the relationship between Taxon and TaxonName should be changed/renamed, definitions improved, and more substantial guidelines should be clearly stated.

frmichel commented 4 years ago

A separate comment about the synonym term problem:

@cboettig: hasSynonym implies it is property of an "accepted name", while synonymOf sounds like a property of "the synonym", but in English "synonyms" are symmetric, there's no "accepted" one. I wonder if (paralleling the darwin core terms) it would be better to use the optional property "acceptedName" (and not define an inverse property).

Right. So, instead of the hasSynonym/synonymOf which does not clarify which one is the accepted one, we could have something like this:

"@type" : "TaxonName",
"name" : "Rollandia micropterum",
"identifier": {
    "@type": "PropertyValue",
    "name": "ITIS id",
    "value": "1000254"
}
"acceptedName": {
    "@type" : "TaxonName",
    "identifier": {
        "@type": "PropertyValue",
        "name": "ITIS id",
        "value": "562791"
}

But then, I'm afraid it would confusing that we would use { name: string } or { scientificName: { taxon name } } for Taxon, whereas acceptedName would be reserved for TaxonName.

AlasdairGray commented 4 years ago

Sorry for coming to this fascinating debate late, and thanks for all the contributions to it.

On the issue of borrowing terms from DWC, this would be perfectly appropriate for Schema.org. Most of the terms around dataset have been lifted from the DCAT vocabulary. Given that DWC is a widely agreed standard here I think this would be appropriate.

I've had to reread this thread a few times and draft several responses, but I think that I'm finally getting my head around this.

The idea of the existing Taxon type is to have a representation of the concept within a taxonomy. However, as stated by @qgroom at the start of this issue, the TaxonName is not this but is something that is coined by a scientist. Would it therefore make sense for TaxonName to inherit from schema:CreativeWork?

If I have indeed got this correct, then the need for the TaxonName is to allow a resource marked up as a Taxon to have multiple TaxonNames. We would then need to support in the Taxon profile a Taxon having multiple TaxonNames.

At the moment I don't see a complete working example going down this route. Can we get a few examples that show resources from different databases marked up as Taxon with their corresponding TaxonNames?

frmichel commented 4 years ago

Just to answer @AlasdairGray's question:

(...) the need for the TaxonName is to allow a resource marked up as a Taxon to have multiple TaxonNames. (...) At the moment I don't see a complete working example going down this route. Can we get a few examples that show resources from different databases marked up as Taxon with their corresponding TaxonNames?

I think the full example with Taxon v5-draft illustrates this use case.

AlasdairGray commented 4 years ago

Are there separate web pages about each of the scientific names that we could reference with an @id attribute?

mjy commented 4 years ago

If I can find the time I will try and aproximate the TaxonWorks model vs. the Taxon v5-draft, we certainly have data that covers its scope, and likely more.

Being pedantic here-

Glancing through I see one issue that might benefit from some clarification, 'taxonRank'. Here, I see it refers to Taxon:

    "taxonRank": [
      "http://rs.tdwg.org/ontology/voc/TaxonRank#Species",
      "http://www.wikidata.org/entity/Q7432",
      "species"
    ],

And elsewhere I see it applying to TaxonName.

Semantically this seems to be a contradiction? We know that things like "Species" apply to TaxonName, and a Taxon. I think there are important distinctions. In the former case nomenclatural rules apply at certain levels (and groups of levels), these rules imply specific consequences. In the latter case (Taxon), their assignment is completely arbitrary/subjective. We assign Rank to TaxonNames because this has "legal" consequences. We assign Rank to Taxa because it helps us remember where things are (and that's about it, see countering examples for Phylogenetic based clades).

frmichel commented 4 years ago

I'm not sure I get your point here. Whether it applies to a name or a taxon, a taxonomic rank is just a rank. Do you mean that we should consider different "species" ranks depending on the object they are assigned to?

ljgarcia commented 1 year ago

Hi @frmichel any news about this issue? Thanks

frmichel commented 1 year ago

Hi @ljgarcia, I've left this work behind for too long for sure. I have on my todo list to continue the work on TaxonName by following Alasdair's advice: add a paragraph in the howtos, then a community tutorial. I'll try to do that during early 2023.

frmichel commented 1 year ago

@all, it's been a while since our last discussions on this topics, it would be good to move towards a first release of the TaxonName type and profile. Let me try to draw a summary the current situation.

Motivation for Taxon vs. TaxonName: A Taxon instance is associated with an accepted/valid name (schema:name "string") and 0 to any number of synonym names (schema:alternateName "string"), as well as identifiers of equivalent taxa from other DBs (schema:identifier). Some databases/portals (such as Zoobank, Mycobank and IPNI) describe scientific names but do not keep track of taxa, such that they would be poorly supported by the use of a Taxon term. Therefore we defined the TaxonName term to support findability for these taxonomic names registries. Two new properties link a Taxon to its TaxonName's, schema:scientificName and schema:alternateScientificName, which are the counterparts of schema:name and schema:alternateName but with a TaxonName instance as an object.

Usage: One would use Taxon when the data is primarily about biology in the broad sense, organisms, biological features, possibly traits although traits arguably relate to a specimen, not necessarily the taxon. One would use TaxonName when the data relates to the publication and typification of a name, specifically as it pertains to the application of a Code of nomenclature.

Open issues: The point in Bioschemas is essentially to support findability of resources related to taxa and/or taxon names on the internet, not to support an understanding of the nuances of taxonomic concepts and their interrelationships with taxon names. Nevertheless, reviewing the thread I identify 2 issues that were raised but did not come to a conclusion yet:

1. A Taxon can point to the accepted/valid name and synonyms. But should TaxonName have a property to represent the relationship between an accepted name and its synonyms? A proposed solution is to have an schema:acceptedName property (a wording using "synonym" was proposed but finally deemed ambiguous because it does not tell which on is the accepted):

"@type" : "TaxonName",
"name" : "Rollandia micropterum",
"identifier": {
    "@type": "PropertyValue",
    "name": "ITIS id",
    "value": "1000254"
}
"acceptedName": {
    "@type" : "TaxonName",
    "name": "Rollandia microptera",
    "identifier": {
        "@type": "PropertyValue",
        "name": "ITIS id",
        "value": "562791"
    }
}

2. Taxon rank vs. Name rank (remark from Matt). The same schema:taxonRank property applies to both Taxon and TaxonName types. However:

I'm not sure how to proceed further. Maybe a vote would help although there is no such feature on Github. Anyway, please do comment further on these questions, and don't hesitate to raise other issues that I left behind if you feel they still need discussing.

frmichel commented 1 year ago

@qgroom @ljgarcia @mjy @cboettig @stylesm @yvanlebras,

Dear all, just an addition, I've written 2 documents that were just published on the main website to describe the Taxon and TaxonName terms:

It would be nice if some of you could review those. Don't hesitate to make changes directly on the github by submitting pull requests: https://github.com/BioSchemas/bioschemas.github.io/tree/master/pages/_tutorials/howto and https://github.com/BioSchemas/bioschemas.github.io/tree/master/pages/_tutorials/community

ljgarcia commented 1 year ago

Hi @qgroom @mjy @cboettig @stylesm @yvanlebras

The tutorials are live at https://bioschemas.org/tutorials/howto/howto_right_profile#38-taxon-taxonname and https://bioschemas.org/tutorials/community/biodiversity

They both look fine to my but biodiversity is not my area.

frmichel commented 1 year ago

Hi all, @ljgarcia @qgroom @mjy @cboettig @stylesm @yvanlebras,

In the community meeting on Feb. 27th, we decided to give another 3 weeks for feedback before we move TaxonName to release status (together with another release of Taxon).

Therefore, we kindly ask you to provide your comments, if any, by March 20th. Thx.

ivanmicetic commented 1 year ago

Hi all, @frmichel @gtsueng we have agreed to proceed with RELEASE 1.0 for Taxon and TaxonName profiles and types.

frmichel commented 1 year ago

Hi @ivanmicetic, ok this is good news.

ljgarcia commented 1 year ago

Hi @frmichel in preparation for the release (it can be done after the deadline given to the community for comments), there are two things you also need to prepare.

gtsueng commented 1 year ago

@ljgarcia -- @frmichel updated the examples to the latest draft. Since there will be no difference between the latest draft and the release (aside from version number), I can copy/paste/rename those with the release after April 28th.

frmichel commented 1 year ago

Hi @ljgarcia and @gtsueng, here is a quick update. The latest draft version are Taxon 0.8 and TaxonName 0.2.

Examples: I have a doubt about the namespaces we use for the types: Taxon exists in schema.org but not yet TaxonName. For now, we use http://schema.org namespace for both. But should we use http://schema.org/Taxon together with https://bioschemas.org/TaxonName?

Live deploys: We have 2 deployments with Taxon 0.7 and TaxonName 0.1. The only difference in both profiles is about the recommended sameAs property, no new minimum property. So both deployments are already compatible with the latest drafts. I assume we shall contact the maintainers once the profiles are released to ask them to update the conformsTo values. Right?

Training material: I've contributed a tutorial for both Taxon and TaxonName that will need an update wrt. conformsTo, as well as a section in the Choose a profile page, that does not need any update.