globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

suggest to use RCC5 logic theory to communicate taxonomic alignments: https://cs.gonzaga.edu/faculty/bowers/papers/bowers-onisw-08.pdf #114

Open jhpoelen opened 1 year ago

jhpoelen commented 1 year ago

@n8upham shared:

RCC5 is the logic theory used to communicate taxonomic alignments: https://cs.gonzaga.edu/faculty/bowers/papers/bowers-onisw-08.pdf

related to #93 .

jhpoelen commented 1 year ago

@n8upham @jar398 's https://github.com/jar398/listtools references RCC5 syntax . Would you happen to know the status of the tool? I noticed that the last commit was made in May 2022.

jhpoelen commented 1 year ago

figure 5 from

Thau, D., Bowers, S. and Ludäscher, B., 2008, October. Merging taxonomies under RCC-5 algebraic articulations. In Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web (pp. 47-54).   Screenshot from 2022-10-07 09-36-24

jar398 commented 1 year ago

Jorrit, RCC5 is a logical theory, not a syntax. That figure you copied is pretty much the whole story on the theory. Euler/X puts a particular syntax on it. I have not yet found documentation for Euler/X but I suspect it does exist. The TCS 2 vocabulary establishes a different syntax (because the vocabulary can be used in a CSV or TSV file), but this is still unpublished work in progress.

jhpoelen commented 1 year ago

@jar398 thanks for your notes.

I've updated my description to reference RCC5 as a logic theory.

I'll have to dig up the Euler/X TCS 2 voc syntaxes your referenced.

Hoping to somehow shoehorn prov-o into it to help describe the process on making these logic claims.

jhpoelen commented 1 year ago

Looks like https://github.com/EulerProject/EulerX has last been updated in 2017 . It is still used?

jar398 commented 1 year ago

It's so simple, there is so very little to it, that there is very little cost in making up your own syntax, or folding it into the ambient syntax of some body of data (e.g. RDF or CSV). I see no advantage in reusing Euler/X syntax and little advantage in reusing the TCS2 vocabulary. We're just talking 5 binary operators. (It is convenient to have a way to capture N-way disjointness as well, to express that a set of taxa in a polytomy are siblings in a hierarchy, but usually there is a hierarchy handy and you can just refer to it for these disjointness axioms.)

jar398 commented 1 year ago

I think Euler/X has an update in progress (python 2 to python 3) but it hasn't been checked in yet because of some snag. The syntax hasn't changed.

jar398 commented 1 year ago

Obviously if you want to run the Euler/X program you need to provide input in a form it likes. But I thought you were just talking about the syntax.

I'm not sure Euler/X is the best notation for communicating alignments. In particular it doesn't give a way to provide metadata of any kind, either for the entire alignment, for the hierarchies being aligned, for particular taxa, or for the individual articulations themselves. All of these could benefit from metadata. For this reason CSV/TSV and RDF would be better choices.

I think the problem of alignment interchange falls in the purview of the TDWG TNC interest group, but it is usually better to standardize work that has already been done and proved outside any standardization process, than to initiate a standards process for something that is unprecedented. However TNC could be a good group in which to incubate systems for alignment interchange, since some of the participants care about this problem. (As do I!)

jhpoelen commented 1 year ago

congruence symbol

proper inclusion

proper inverse inclusion

partial overlap

exclusion !

jhpoelen commented 1 year ago

In trying to figure out intuitive language around these relations -

N Congruent with M
N compatible with M (short hand) N M

N is properly included in M N part of M (short hand) N M

N is properly inversely included in M N has part M (short hand) N M

N partially overlaps with M N M

N excludes M N ! M

@jar398 do you have any ideas on how to make the RCC5 logic terms a little more appealing for English speakers?

jar398 commented 1 year ago

I don't see any way to avoid an education step, explaining whatever symbols or words are used, and that can introduce words or symbols by fiat. E.g. Euler/X uses 'overlaps' and just resigns to the fact that the word is being used in a special way and users/readers will have to be alerted. Fortunately the explanation is simple enough it can be put (e.g.) in a figure caption.

ASCII versions are helpful because it seems the world's computing world's conversion to unicode is not complete. I put some non-ASCII in a spreadsheet recently and it got mangled when someone in our group tried to look at it. So I've switched to ASCII, following Euler/X. It's very retro but easier than trying to reconfigure everyone's workstations.

There is some value in copying what some other source does (e.g. Euler/X) in case a user/reader has already encountered it - less reeducation.

jhpoelen commented 1 year ago

Ok, using euler/x , I imagine the textual descriptions are captured in the examples at https://github.com/EulerProject/EulerX/tree/master/example and https://github.com/EulerProject/EulerX/wiki/Euler-Reasoning-Tool among other locations.

N Congruent with M N equals M (EulerX) N M

N is properly included in M N is_included_in M (EulerX) N M

N is properly inversely included in M N includes M (EulerX) N M

N partially overlaps with M N overlaps M (EulerX) N M

N excludes M N disjoint M (EulerX) N ! M

Is this the kind of ASCII versions (e.g., equals, is_included_in, includes, overlaps, disjoint) you were referring to?

jar398 commented 1 year ago

The words in the examples work in Euler/X, but symbols are easier to read in a spreadsheet and are also accepted. They are = < > >< !

jhpoelen commented 1 year ago

@jar398 thanks for clarifying!

So now we have:

N Congruent with M N M N equals M (EulerX) N = M (EulerX)

N is properly included in M N M N is_included_in M (EulerX) N < M (EulerX)

N is properly inversely included in M N M N includes M (EulerX) N > M (EulerX)

N partially overlaps with M N M N overlaps M (EulerX) N >< M (EulerX)

N excludes M N disjoint M (EulerX) N ! M N ! M (EulerX)

I can see how ASCII would be a little easier in communication than the fancy Unicode characters.

n8upham commented 1 year ago

Nice! Yeah this is helpful that you’re digging into this also Jorrit. And great to see the notation all laid out. Are you thinking of implementing RCC5 within GloBI via nomer?

On Oct 26, 2022, at 2:03 PM, Jorrit Poelen @.***> wrote:

@jar398 https://urldefense.com/v3/__https://github.com/jar398__;!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvM1epPRVI$ thanks for clarifying!

So now we have:

N Congruent with M N ≡ https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Triple_bar__;!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvMhTUdg6w$ M N equals M (EulerX) N = M (Euler X)

N is properly included in M N ⊊ https://urldefense.com/v3/__https://en.wikipedia.org/wiki/**B__;4oqK!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvMh4f3RMQ$ M N is_included_in M (EulerX) N < M (Euler X)

N is properly inversely included in M N ⊋ https://urldefense.com/v3/__https://en.wikipedia.org/wiki/**B__;4oqL!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvM6pgnXrg$ M N includes M (EulerX) N > M (EulerX)

N partially overlaps with M N ⊕ https://urldefense.com/v3/__https://en.wikipedia.org/wiki/**B_(disambiguation)__;4oqV!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvM-XSXHek$ M N overlaps M (EulerX) N >< M (EulerX)

N excludes M N disjoint M (EulerX) N ! https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Exclamation_mark__;!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvMVuXJiXI$ M N ! M (EulerX)

I can see how ASCII would be a little easier in communication than the fancy Unicode characters.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/globalbioticinteractions/nomer/issues/114*issuecomment-1292649320__;Iw!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvMGC0QqG8$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC3WZU6WG4BTOWIFYM7YOJDWFGMANANCNFSM6AAAAAAQ5WMZCA__;!!IKRxdwAv5BmarQ!f16TmtkKHIVmHbepukF7CPKwn_bmO3y8NyRocF7ckMpm4LQ_7iCs1Q8rCUwiNaJtZwJRPfw-Q9gIZn8qNQvMGv0jLdA$. You are receiving this because you were mentioned.

jhpoelen commented 1 year ago

@n8upham I like the idea of using RCC5 within GloBI via nomer for relating concepts. Perhaps the RCC5 can be used in combination with Provenance Ontology to say things like:

Nathan U. claimed that Homo sapiens < Mammalia based on MDD v1.9 doi:10.5281/zenodo.6407053

or

Nathan U. claimed that Homo sapiens >< Chiroptera based on MDD v1.9 doi:10.5281/zenodo.6407053

or

Nathan U. claimed that Homo sapiens = NCBI:9606 based on MDD v1.9 doi:10.5281/zenodo.6407053 and NCBI Taxonomy vX.Y

Is this what you had in mind?

How would you propose to express these supporting claims?

How would you express doubtful claims?

See, e.g., #100 @jtmiller28 -

[...] When I was doing some quality control on my resolved names with multiple mappings I happened to notice that one of the names I was considering remapping would remap to a "Doubtful" name, (see Ilex california , http://www.worldfloraonline.org/taxon/wfo-0001265295). ) [...]

n8upham commented 1 year ago

Yes this seems excellent Jorrit — it would basically be a “identified by” type of field, with evidence cited, but then in this be for taxonomic equivalencies rather than specimen identifications. —n

On Oct 28, 2022, at 9:36 AM, Jorrit Poelen @.***> wrote:

@n8upham https://urldefense.com/v3/__https://github.com/n8upham__;!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRHt2IipPQ$ I like the idea of using RCC5 within GloBI via nomer for relating concepts. Perhaps the RCC5 can be used in combination with Provenance Ontology to say things like:

Nathan U. claimed that Homo sapiens < Mammalia based on MDD v1.9 doi:10.5281/zenodo.6407053

or

Nathan U. claimed that Homo sapiens ! Chiroptera based on MDD v1.9 doi:10.5281/zenodo.6407053

or

Nathan U. claimed that Homo sapiens = NCBI:9606 based on MDD v1.9 doi:10.5281/zenodo.6407053 and NCBI Taxonomy vX.Y

Is this what you had in mind?

How would you propose to express these supporting claims?

How would you express doubtful claims?

See, e.g., #100 https://urldefense.com/v3/__https://github.com/globalbioticinteractions/nomer/issues/100__;!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRH3NeUbbo$ @jtmiller28 https://urldefense.com/v3/__https://github.com/jtmiller28__;!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRHa5TFRZM$ -

[...] When I was doing some quality control on my resolved names with multiple mappings I happened to notice that one of the names I was considering remapping would remap to a "Doubtful" name, (see Ilex california , http://www.worldfloraonline.org/taxon/wfo-0001265295 https://urldefense.com/v3/__http://www.worldfloraonline.org/taxon/wfo-0001265295__;!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRH3mYtxvM$). ) [...]

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/globalbioticinteractions/nomer/issues/114*issuecomment-1295207032__;Iw!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRHXUeGCQw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC3WZU7PDQPF5FZK2GZCETTWFP6INANCNFSM6AAAAAAQ5WMZCA__;!!IKRxdwAv5BmarQ!b2CS6DBjQxhr6oBrIhRsze24vYrz_PxhRgCtqUOJ0xf4wArV_epNkF2xAOexc0EfCZ-LrEwFUgrCT5PR3pRHGH0R7gw$. You are receiving this because you were mentioned.

jar398 commented 1 year ago

I think it's valuable to be clear about the semantics of statements like these. If I say "Nathan U. claimed that Homo sapiens >< Chiroptera" then that assumes that you, the reader, reading what I said, will know what I meant by "Homo sapiens" and "Chiroptera". Which you won't; i.e the semantics of those names is unclear. When designating a taxon or taxon concept there should always be a reference to a taxonomy that constrains what is meant by the name ("cite your source"). So the meaning you want is:

Nathan U. claimed that {Homo sapiens sensu MDD v1.9 doi:10.5281/zenodo.6407053 >< Chiroptera sensu MDD v1.9 doi:10.5281/zenodo.6407053, according to MDD v1.9 doi:10.5281/zenodo.6407053}

which is different from the first formulation. The long form appears to be redundant but it is not. Figure out how you want to encode this, if such a form is too verbose - that's a separate problem.

It's not clear that you need to say that 'Nathan U. claimed' this because there may be no other way to interpret the claim made in the source; there is no way to apply judgment to it because the source is perfectly clear (e.g. DwC parentNameUsageID clearly means < if there are at least two children). It is simply a matter of fact that the source says [...] >< [...] or whatever. It is the source itself that is making the judgment.

I'm also very uneasy about words like "identify" and even "designate" in connection with a taxon. When we talk about a taxon we almost always just say something that constrains the interpretation of the name we use for the taxon. This is true even for 'sensu' names. It is nearly impossible to be exact in specifying a taxon (to 'identify' it). The best we can ever do (and what we should always do) is the 'sensu' form, which identifies the text of a set of claims around the taxon - which itself is only a constraint, not an 'identification'. (This is elementary model theory.)

The only time it's ever OK to do without a 'sensu' is if you're overwhelmingly sure the reader will interpret a name without any confusion. So we don't need to say doi:10.1234/5678 sensu Crossref (as if that would help!). And we don't need sensu in Darwin Core files or similar taxonomy-specific tables since the data set itself is a source of claims about the taxon. We have an implicit 'sensu D' on all names, where D is the data set.

If we're connecting information from multiple sources, we need an alignment, which is a set of judgments expressed in terms of 'sensu A' and 'sensu B' names, e.g. an alignment might say that X sensu A = X sensu B. If the alignment is only in Nathan's head, then we say that Nathan (on given date) says that X sensu A = X sensu B , etc. but it's better practice for an alignment between entire taxonomies to be stored for reference (by DOI or whatever).

jar398 commented 1 year ago

Maybe by "Nathan U. claimed that Homo sapiens >< Chiroptera" the intent is that these names are meant to be interpreted sensu GloBI. In that case it's best to refer to an alignment between GloBI and MDD explaining how you get from GloBI names and meanings to MDD names and meanings. The names might be used in different ways in the two sources, or different names might be used.

jhpoelen commented 1 year ago

@n8upham @jar398 Yes yes! Hoping to work towards an implementation using these ideas. I'd say we need something in addition to DOIs to point to specific digital resources. . .

I'd like to work on specific examples first, then move to more complex ones.

jhpoelen commented 1 year ago

Today, as I was working on a data review of Ollerton et al. 2022 (see https://github.com/CatalogueOfLife/backend/issues/1171 and https://github.com/globalbioticinteractions/nomer/issues/124), I might have stumbled across a neat use case for describing relations between name strings and the taxonomic concepts that are claimed to be related to.

In case of https://github.com/globalbioticinteractions/nomer/issues/124 - There's a name Aglais io (Linnaeus 1758) that recently popped up in the Catalogue of Life after they incorporated the Global Lepidoptera Index [2] over the summer.

@n8upham @jar398 do you think would be a suited use case to work out how to describe the many layers of who said what when re: Aglais io (Linnaeus 1758) ?

references

[1] Ollerton, J., Trunschke, J. ., Havens, K. ., Landaverde-González, P. ., Keller, A. ., Gilpin, A.-M. ., Rodrigo Rech, A. ., Baronio, G. J. ., Phillips, B. J., Mackin, C. ., Stanley, D. A., Treanore, E. ., Baker, E. ., Rotheray, E. L., Erickson, E. ., Fornoff, F. ., Brearley, F. Q. ., Ballantyne, G. ., Iossa, G. ., Stone, G. N., Bartomeus, I. ., Stockan, J. A., Leguizamón, J., Prendergast, K. ., Rowley, L., Giovanetti, M., de Oliveira Bueno, R., Wesselingh, R. A., Mallinger, R., Edmondson, S., Howard, S. R., Leonhardt, S. D., Rojas-Nossa, S. V., Brett, M., Joaqui, T., Antoniazzi, R., Burton, V. J., Feng, H.-H., Tian, Z.-X., Xu, Q., Zhang, C., Shi, C.-L., Huang, S.-Q., Cole, L. J., Bendifallah, L., Ellis, E. E., Hegland, S. J., Straffon Díaz, S., Lander, T. A. ., Mayr, A. V., Dawson, R. ., Eeraerts, M. ., Armbruster, W. S. ., Walton, B. ., Adjlane, N. ., Falk, S. ., Mata, L. ., Goncalves Geiger, A. ., Carvell, C. ., Wallace, C. ., Ratto, F. ., Barberis, M. ., Kahane, F. ., Connop, S. ., Stip, A. ., Sigrist, M. R. ., Vereecken, N. J. ., Klein, A.-M., Baldock, K. ., & Arnold, S. E. J. . (2022). Pollinator-flower interactions in gardens during the COVID-19 pandemic lockdown of 2020. Journal of Pollination Ecology, 31, 87–96. https://doi.org/10.26786/1920-7603(2022)695

[2] Beccaloni, G., Scoble, M., Kitching, I., Simonsen, T., Robinson, G., Pitkin, B., Hine, A., Lyal, C., Ollerenshaw, J., Wing, P., & Hobern, D. (2022). Global Lepidoptera Index. In O. Bánki, Y. Roskov, M. Döring, G. Ower, L. Vandepitte, D. Hobern, D. Remsen, P. Schalk, R. E. DeWalt, M. Keping, J. Miller, T. Orrell, R. Aalbu, R. Adlard, E. M. Adriaenssens, C. Aedo, E. Aescht, N. Akkari, S. Alexander, et al., Catalogue of Life Checklist (Version 2022-10-19). https://doi.org/10.48580/dfqf-49xk

jhpoelen commented 1 year ago

btw Plazi's treatment bank lists the name as Aglais io Linnaeus, 1758 not Aglais io (Linnaeus, 1758) as reported by Catalogue of Life. For some reason, I find it strange that Linnaeus revised his own description that early. @n8upham @jar398 any idea why the parenthesis were added by the Catalogue of Life?

image

https://tb.plazi.org/GgServer/html/6530EECCB822584BA9594682BA79A09D

image

n8upham commented 1 year ago

Hey Jorrit - I’m not sure which one is correct in this case (with parentheses or without), but it is indeed possible for a name to be (Linnaeus, 1758) — see: https://www.mammaldiversity.org/explore.html#genus=Caluromys&species=philander&id=1000015 vs https://www.mammaldiversity.org/explore.html#genus=Didelphis&species=marsupialis&id=1000021

This just means that the species was originally described (by Linnaeus) in a genus other than the currently recognized one. For the current example: "It was first described by Swedish zoologist Carl Linnaeus as Didelphis philander in the 10th edition of Systema Naturae (1758). It was given its present binomial name, Caluromys philander, by American zoologist Joel Asaph Allen in 1900” https://en.wikipedia.org/wiki/Bare-tailed_woolly_opossum

On Nov 1, 2022, at 1:51 PM, Jorrit Poelen @.***> wrote:

btw Plazi's treatment bank lists the name as Aglais io Linnaeus, 1758 not Aglais io (Linnaeus, 1758) as reported by Catalogue of Life. For some reason, I find it strange that Linnaeus revised his own description that early. @n8upham https://urldefense.com/v3/__https://github.com/n8upham__;!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuGwpG7cZA$ @jar398 https://urldefense.com/v3/__https://github.com/jar398__;!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuG9NiDLK4$ any idea why the parenthesis were added by the Catalogue of Life?

https://urldefense.com/v3/__https://user-images.githubusercontent.com/1084872/199337983-3697a1a6-1a39-43dd-a1a2-d22054dc32f8.png__;!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuGBYbcYvA$ https://tb.plazi.org/GgServer/html/6530EECCB822584BA9594682BA79A09D https://urldefense.com/v3/__https://tb.plazi.org/GgServer/html/6530EECCB822584BA9594682BA79A09D__;!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuGogWBFZU$ — Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/globalbioticinteractions/nomer/issues/114*issuecomment-1299129736__;Iw!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuGx89DOow$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC3WZU2MVV25ZOPGY7ITYKLWGF7DZANCNFSM6AAAAAAQ5WMZCA__;!!IKRxdwAv5BmarQ!ZirrkLUQLo8cGFFB2gXWcyuKAfCq06ytqaLemPHID28ttMa3PBcur3Cp-nadB6ah7VwgGh59yB4GYPWKySuGa7g9QKU$. You are receiving this because you were mentioned.

jhpoelen commented 1 year ago

@n8upham thanks for clarifying. I guess I misunderstood the parenthesis notation. For some reason, I thought the parenthesis contained the name of the newer publication, instead of the older one. Lots to learn still . . .

jhpoelen commented 1 year ago

For examples of X ! Y, see https://github.com/jhpoelen/msw-plazi/commit/91a6addcf5a758eaf14df2f0f9fed48674e6b972#r96666538 .

First example includes:

Example for expressing:

"Abrocoma bennetti is not found in ITIS"

could be:

Abrocoma bennetti ! ITIS

jar398 commented 1 year ago

I am not keen on metonymy (in this case, using a database as a region). Is it spelled out somewhere that when interpreting ITIS as a taxon or region what is meant is ... well what would it be, the union of all of the tip taxa? That wouldn't work, e.g. it would leave out species for which there are subspecies. It's not the union of all the taxa, because Abrocoma is in ITIS and Abrocoma bennetti has to be in Abrocoma.

Unless... you are talking about regions in a space of names, rather than the way we've used RCC-5 so far in TDWG which is regions in a space of individual organisms. That would be highly nonstandard and would have to be specified somewhere.

Also it just says "ITIS" without saying which snapshot of it is intended.

jhpoelen commented 1 year ago

@jar398 thanks for taking the time to respond.

I agree that my shorthand "ITIS" didn't provide much needed context.

Also it just says "ITIS" without saying which snapshot of it is intended.

I agree, and the specific version of ITIS can be inferred from the "claimBy" citation in the full annotations.csv (e.g., https://github.com/jhpoelen/msw-plazi/blob/main/annotations.csv#L34

Salim JA, Poelen JH (2022). globalbioticinteractions/nomer: 0.4.7 (0.4.7). Zenodo. https://doi.org/10.5281/zenodo.7411758

This specific version of Nomer uses:

https://github.com/globalbioticinteractions/nomer/blob/a5114b84c255a97d21c1df5747c77e947e57a470/nomer/src/main/resources/org/globalbioticinteractions/nomer/default.properties#L85

which points to:

Poelen, Jorrit H. (2022). Nomer Corpus of Taxonomic Resources hash://sha256/dac5911a81fb605fab012e90c98b37e990a076d77f9264fdb38ec7f379d82108 hash://md5/46fab6751bafd4de4f49aaa8c511e39d (0.9) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7405576

which includes a specific version of a ITIS data product retrieved from https://www.itis.gov/downloads/itisMSSql.zip at some well defined point in time.

I'd be open to updating the reference to the definition of ITIS region of space of the individual organisms they organize.

Perhaps something like:

ITIS@hash://sha256/dac5911a81fb605fab012e90c98b37e990a076d77f9264fdb38ec7f379d82108

?

of a more GitHub like succinct hash notation

ITIS@dac5911

with a longer reference available somewhere close by?

jhpoelen commented 1 year ago

I am not keen on metonymy (in this case, using a database as a region). Is it spelled out somewhere that when interpreting ITIS as a taxon or region what is meant is ... well what would it be, the union of all of the tip taxa? That wouldn't work, e.g. it would leave out species for which there are subspecies. It's not the union of all the taxa, because Abrocoma is in ITIS and Abrocoma bennetti has to be in Abrocoma.

I'll probably have to think about this a little more.

In my initial try, I saw the short hand "ITIS" as the region consisting of all taxonomic concepts definitions (or organisms) known to ITIS as referred to by their taxonomic name.

So, in my (naive?) mind,

Abrocoma bennetti ! ITIS

meant, the taxonomic concept as referenced by Abrocoma bennetti is not part of ITIS (i.e., ITIS@dac5911).

However, the following name (notice the different suffix), is part of ITIS

Abrocoma bennettii < ITIS@dac5911

And, yes, the region of concepts that are included genus is known, which may be expressed as:

Abrocoma < ITIS@dac5911

I am sure I am making all sort of mistakes here, so I'd welcome your insights as I am stumbling my way into a more compact notation to relate names from one corpus (in this case Plazi's interpretation of Mammal Species of the World) to another (ITIS@dac5911).

jhpoelen commented 1 year ago

Note that I should probably be more explicit in defining the origin of

"Abrocoma bennetti"

which may be expressed as:

Plazi Community. (2022). Plazi Treatments XML Archive hash://sha256/3cfd60b8b19e76d208377537835de92efdb5b945a6a71765b74ed2fe22298b42 hash://md5/594923284e3eb9965b8cbad149c76cd0f (hash://sha256/3cfd60b8b19e76d208377537835de92efdb5b945a6a71765b74ed2fe22298b42) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7443343

with possible shorthand Plazi@3cfd60b ?

So, a more specific way of saying would be something like

"Abrocoma bennetti" as found in Plazi@3cfd60b .

More specifically, you can cite the exact location of the reference to the taxonomic concept by using notation also present in annotations.csv -

cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/E1/1F/87/E11F878EFFCAFFC5FF50F949FDC9DA9C.xml!/b4465-4482

which is similar to saying, select characters 4465-4482 of zip entry treatments-xml-main/data/E1/1F/87/E11F878EFFCAFFC5FF50F949FDC9DA9C.xml in content identified by hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197 . And this content can be traced back to a product generated by https://github.com/plazi/treatments-xml .

preston cat 'cut:zip:hash://sha256/56caf9620cd58df6fb517dfb7cd01e2e81e54a41eef2f562c5eddfbd70ba6197!/treatments-xml-main/data/E1/1F/87/E11F878EFFCAFFC5FF50F949FDC9DA9C.xml!/b4465-4482'

would produce output:

Abrocoma bennetti
jhpoelen commented 1 year ago

by the way @jar398 I don't expect you to respond, but would be grateful if you do - you've contributed so much already.

jar398 commented 1 year ago

Using RCC-5 to talk about inclusion of a name in a name list is inappropriate. The only way you could have it make sense is if you documented exactly which individual organisms are in the ITIS region and which aren't. I suggested a couple of ways you might try to do this and neither one works. What you want is a different relationship, one that holds between a name (or record, etc.) and a database snapshot, or something like that. Not RCC-5, which is for taxa; ITIS is not a taxon. If you need additional columns or tables, so be it.

jhpoelen commented 1 year ago

Ok, clearly I need to some time better understand the RCC-5 mechanics.

At risk of showing yet another way I don't fully understand RCC-5,

would it be more appropriate to say something like:

Abrocoma bennetti ! Animalia

and

Abrocoma bennettii < Animalia

because there's no known concept referenced to by Abrocoma bennetti, but Abrocoma bennettii is known.

In my shorthand, I was trying to refer to ITIS as a taxonomic space, not a flat name list or database snapshot.

jar398 commented 1 year ago

It's not about the mechanics, it's about semantics. The best way to think of RCC-5 is via some model consisting of a space (a set) of somethings, such that an RCC-5 region is just some subset of the space. The points could be points in geographic space (perhaps designated by lat/long coordinates), but in a taxonomy context the points are individual organisms. You don't have a region and can't use something with an RCC-5 relation unless you have said what points (individual organisms) you are talking about. Then ! says the regions/sets are disjoint, < is for inclusion, etc. If you want to put ITIS to the right of ! you have to say what points you are talking about so that you can interpret it as a region. You have not done so and I don't think you should try to.

Suppose someone said, e.g., ITIS < COL, and that there was some doubt as to whether this was true. To determine whether it's the case, you would have to ask: is every point in region/taxon ITIS also a point in region/taxon COL? Where the points are individual butterflies and sharks and so on? You'd have specify which individuals are in ITIS and which are in COL. That is not the kind of question you care about in here. And I bet you'd get logical inconsistencies if you went this route.

In talking about whether names are in databases, RCC-5 is simply the wrong logic. Say what you mean, instead.

RCC-5 has a very weak toehold in the taxonomy business. I'd like to see it used more, but not to the point of inappropriate use that would lead to incorrect entailments - that would be a threat to RCC-5, not a success.

jhpoelen commented 1 year ago

@jar398 thanks for taking the time to humor me in grappling on this topic.

It's not about the mechanics, it's about semantics.

Yes, by mechanics, I probably meant semantics. I appreciate your specific use of language.

The best way to think of RCC-5 is via some model consisting of a space (a set) of somethings, such that an RCC-5 region is just some subset of the space.

Yes, this is how I understand the idea of RCC-5 also. Perhaps this is where my misunderstanding of RCC-5 shows, because I agree with you, and I don't understand why RCC-5 usage in my examples don't make sense to you.

You don't have a region and can't use something with an RCC-5 relation unless you have said what points (individual organisms) you are talking about.

In my mind, the Animalia refers to a region of all taxa included in Animalia as claimed by ITIS.

So,

Abrocoma bennetti ! Animalia 

can be expanded to:

Abrocoma bennetti ! { ..., Homo sapiens, ..., Abrocoma bennettii, ... }  

with this, wouldn't you be able to say: Abrocoma bennettii < { ..., Homo sapiens, ..., Abrocoma bennettii, ... } ?

where:

Animalia relates to ITIS:202423

Homo sapiens relates to ITIS:180092 , and

Abrocoma bennettii refers to ITIS:584787

In talking about whether names are in databases, RCC-5 is simply the wrong logic. Say what you mean, instead.

I am desperately trying to say what I mean, and I consider the statements above as relations between taxa as defined in some space constructed by our Plazi/ITIS colleagues and their contributors.

I fear I might have a fundamental different (incorrect even?) perspective of what ITIS is.

Curious to hear your thoughts and specific counter examples if you care to share. If you don't think this is going anywhere or is taking too much of your attention/energy, please do share, I'll have to find some other way to relate taxonomic spaces.

jar398 commented 1 year ago

You have not told me how to answer the question of whether a specific beetle (e.g. on crawling on my desk) is in the ITIS region. To repeat what I said before, if Abrocoma (the region) is in ITIS (whatever region you intende by that), then every organism in that genus is in ITIS, even if its species is not listed in ITIS. Membership in a taxon has to be determined by circumscription, not by looking at ITIS or any other aggregator. Animalia is in ITIS. That would mean that every individual in the Animalia taxon (given by... circumscription or characters) is in the ITIS region. Therefore Abrocoma bennettii is in ITIS. This just doesn't agree with the way you are using it.

I'm not asking what ITIS is (although it is very clearly not a taxon, taxon concept, name, region, etc.). I'm saying that if you want to say something coherent, you have to provide a definition for determining membership of an individual organism in the taxon / region that you are associating with ITIS. (The association is by metonymy, which I already told you I don't like - a database is not a taxon and is not a region, and what you propose is confusing to human readers, and confusing to formal reasoners). And even if you did have such a method, which you have not given, it would still be a bad idea.

The RCC-5 space used in taxonomy is the space of organisms, not the space of database records or names or taxon concepts or anything else human-made. You are clearly trying to recycle these relationships for a very different purpose (taxon concepts in sets of taxon concepts? or something like that) and it just looks gratuitous to me.

Why are you so insistent on trying to reuse RCC5 for this purpose, for which it was clearly not intended? Instead of using relationships intended to make scientific statements, which you are not doing, why not just write that you are talking about membership of a name in a database, or whatever it is you're actually doing?

jar398 commented 1 year ago

Another way to say it: using RCC-5 for anything other than the specimen/organism memberships of taxon concepts would be destructive to the cause of getting people to use it in its main use case of biologically comparing taxon concepts. ITIS is not a taxon concept, and you seem to be talking about regions populated with things that are not bona fide biological entities. Your own definition doesn't work, for the reason I've given above twice. If we stick to the original use case we'll have a better shot at making the original use case succeed in the community. Anything else is sabotage.

jhpoelen commented 1 year ago

@jar398 thanks for the animated comments.

I'll have to sit on this a little bit.