globalbioticinteractions / name-alignment-template

align names with known taxonomic resources
https://big-bee-network.github.io/name-alignment-workshop
Creative Commons Zero v1.0 Universal
3 stars 6 forks source link

add PBDB as an additional catalogue #9

Closed ljwalker closed 1 year ago

ljwalker commented 1 year ago

Would be possible to add the PBDB as another catalogue to this tool? The Paleo-Data WG thinks this would be a helpful feature for their community.

jhpoelen commented 1 year ago

hey @ljwalker - sounds like a good idea to add PBDB to the mix. I poked around on the website and could not find a download link to get all the data. Would you happen to know whether they offer their entire db as a single download?

jhpoelen commented 1 year ago

In their documentation, I did find this section:

https://paleobiodb.org/data1.2/changelog_doc.html#CHANGES%20IN%20VERSION%201.2%20b2

curl -L "https://paleobiodb.org/data1.2/taxa/list.csv?all_records"\
 | gzip\
 > paleobiodb-taxa.csv.gz

which produced the attached taxa gzipped csv file.

paleobiodb-taxa.csv.gz

The first 10 lines of this file look like:

orig_no taxon_no record_type flags taxon_rank taxon_name difference accepted_no accepted_rank accepted_name parent_no reference_no is_extant n_occs
1 1 txn V kingdom Eukaryota obsolete variant of 1 unranked clade Eukaryota 28595 64730 extant 1525224
1 306691 txn unranked clade Eukaryota 1 unranked clade Eukaryota 28595 64730 extant 1525224
2 2 txn subkingdom Metazoa subjective synonym of 67091 kingdom Animalia 212579 35740 extant 1274652
3 3 txn phylum Actinopoda 3 phylum Actinopoda 1 6930 extant 0
4 4 txn phylum Radiolaria 4 phylum Radiolaria 213089 6930 extant 50798
5 5 txn order Spumellaria 5 order Spumellaria 111594 6930 extant 19197
5 111595 txn V suborder Spumellaria obsolete variant of 5 order Spumellaria 111594 25851 extant 19197
6 6 txn genus Acaeniotyle 6 genus Acaeniotyle 84726 6930 extinct 312
7 7 txn genus Acanthopyle 7 genus Acanthopyle 5 6930 extinct 0

Is this the kind of taxonomic information you'd like to align your names to?

(see attached screenshot). image

ljwalker commented 1 year ago

Is this the kind of taxonomic information you'd like to align your names to?

Yes. The only fields that aren't necessary to retain would be _recordtype, _containerno, _referenceno, and _noccs. If _isextant might be helpful to GloBI or others, you could keep that in, too.

ljwalker commented 1 year ago

hey @ljwalker - sounds like a good idea to add PBDB to the mix. I poked around on the website and could not find a download link to get all the data. Would you happen to know whether they offer their entire db as a single download?

I'm not sure, but @markuhen might have more advice on this.

markuhen commented 1 year ago

All,

Thanks for the inquiry.

The answer is (of course) yes and no. You can use our API to get any or all of the data. You can also hit our GitHubhttps://github.com/paleobiodb to get all of our code. If that doesn’t work for you, check with Shanan Peters for other options. He knows the nuts and bolts much better than I!

Thanks,

Mark

Mark D. Uhen Professor & Chair George Mason University AOES Geology MSN 6C5 Fairfax, VA 22030 Phone: 703-993-5264 Fax: 703-993-3535

From: Lindsay Walker @.> Date: Thursday, January 26, 2023 at 3:13 PM To: globalbioticinteractions/name-alignment-template @.> Cc: Mark D Uhen @.>, Mention @.> Subject: Re: [globalbioticinteractions/name-alignment-template] add PBDB as an additional catalogue (Issue #9)

hey @ljwalkerhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fljwalker&data=05%7C01%7Cmuhen%40gmu.edu%7C2d8f6d42c12b465b44f008daffd9bcdc%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103607882852121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=W6evJ9XJL6V3TPWkMjpAjVie937SIuWddaXvWCyiBuc%3D&reserved=0 - sounds like a good idea to add PBDB to the mix. I poked around on the website and could not find a download link to get all the data. Would you happen to know whether they offer their entire db as a single download?

I'm not sure, but @markuhenhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkuhen&data=05%7C01%7Cmuhen%40gmu.edu%7C2d8f6d42c12b465b44f008daffd9bcdc%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103607882852121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z%2BaWsXrxXTvSSYNeVsMgZqI5NcmxHuycIALjcIfjvjY%3D&reserved=0 might have more advice on this.

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fname-alignment-template%2Fissues%2F9%23issuecomment-1405593520&data=05%7C01%7Cmuhen%40gmu.edu%7C2d8f6d42c12b465b44f008daffd9bcdc%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103607882852121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FtKG9RkvvQkv%2BnLEL10qsFpWTDYc42oAbzXaE2o3B50%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEVGYTD5XQCCPG2VI3ANP5DWULLFDANCNFSM6AAAAAAUHZ2PAI&data=05%7C01%7Cmuhen%40gmu.edu%7C2d8f6d42c12b465b44f008daffd9bcdc%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103607883008335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2TbTRfoEUP6ZD8EvbRqrUGij%2FqGTLtjJK9UfNR7ezbM%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jhpoelen commented 1 year ago

@ljwalker @markuhen thanks for your input . . .

I've created a first pass at PBDB integration for Nomer as suggested.

Hoping to include support in the next Nomer release. Until then, I'll keep this issue open.

T. Rex example below created using Nomer, mlr and bash scripting.

The name alignment tool would hide all this plumbing. . .

I am sure I made some mistakes in the integration, so I am eager to get your feedback as soon at PBDB is available in the name alignment tool.

echo -e "\tTyrannosaurus rex"\
 | nomer append --properties my.properties --include-header pbdb\
 | mlr --itsvlite --omd cat
providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl resolvedCatalogName
Tyrannosaurus rex HAS_ACCEPTED_NAME PBDB:54833 Tyrannosaurus rex H. F. Osborn 1905 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Sauropsida | Reptilia | Eureptilia | Romeriida | Diapsida | Archosauromorpha | Crocopoda | Archosauriformes | Eucrocopoda | Archosauria | Avemetatarsalia | Ornithodira | Dinosauromorpha | Dinosauriformes | Dinosauria | Saurischia | Theropoda | Neotheropoda | Averostra | Tetanurae | Coelurosauria | Tyrannosauroidea | Tyrannosauridae | Tyrannosaurinae | Tyrannosaurus | Tyrannosaurus rex PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:135358 | PBDB:36322 | PBDB:92204 | PBDB:99771 | PBDB:37768 | PBDB:38182 | PBDB:347446 | PBDB:57091 | PBDB:347458 | PBDB:38215 | PBDB:144376 | PBDB:57250 | PBDB:53366 | PBDB:53207 | PBDB:52775 | PBDB:38505 | PBDB:38513 | PBDB:56397 | PBDB:92043 | PBDB:53374 | PBDB:53940 | PBDB:58837 | PBDB:38606 | PBDB:65352 | PBDB:38613 | PBDB:54833 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | unranked clade | class | subclass | unranked clade | subclass | infraclass | unranked clade | unranked clade | unranked clade | order | informal | unranked clade | unranked clade | unranked clade | order | order | order | unranked clade | unranked clade | unranked clade | suborder | superfamily | family | subfamily | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=54833
markuhen commented 1 year ago

Jorrit,

It looks pretty straightforward and correct to me, but I think this is a pretty nomenclaturally easy test case.

Maybe try a couple of more tricky ones? Here are a few that might be more of a challenge:

Dorudon atrox Eocetus schweinfurthi Sophianacetus commenticus

These all have some twists in how the name is resolved. Hopefully they will work too!

Thanks,

Mark

Mark D. Uhen Professor & Chair George Mason University AOES Geology MSN 6C5 Fairfax, VA 22030 Phone: 703-993-5264 Fax: 703-993-3535

From: Jorrit Poelen @.> Date: Thursday, January 26, 2023 at 4:20 PM To: globalbioticinteractions/name-alignment-template @.> Cc: Mark D Uhen @.>, Mention @.> Subject: Re: [globalbioticinteractions/name-alignment-template] add PBDB as an additional catalogue (Issue #9)

@ljwalkerhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fljwalker&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369229329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y7OlZHVILtj3YDrTkq%2FGU8Hk1ID%2FkeJ9aQ8qIo5pGn0%3D&reserved=0 @markuhenhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkuhen&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369229329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DhDs0HVJZrVVT7EMeEzgxOn%2Bo71UtSRlXmVrBpm4rV0%3D&reserved=0 thanks for your input . . .

I've created a first pass at PBDB integration for Nomer as suggested.

Hoping to include support in the next Nomer release. Until then, I'll keep this issue open.

T. Rex example below created using Nomerhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fnomer&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369229329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2wRm25hYsbsiX64zzhwZojnjjqENbCL%2BNc3Cc8oCWlY%3D&reserved=0, mlrhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmanpages.ubuntu.com%2Fmanpages%2Fbionic%2Fman1%2Fmlr.1.html&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369229329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=J3H6fwEvWsDAa8lsZU6QlNLcPxUC3MW6ZOtUf%2F9gjsA%3D&reserved=0 and bash scripting.

The name alignment tool would hide all this plumbing. . .

I am sure I made some mistakes in the integration, so I am eager to get your feedback as soon at PBDB is available in the name alignment tool.

echo -e "\tTyrannosaurus rex"\

| nomer append --properties my.properties --include-header pbdb\

| mlr --itsvlite --omd cat providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl resolvedCatalogName Tyrannosaurus rex HAS_ACCEPTED_NAME PBDB:54833 Tyrannosaurus rex H. F. Osborn 1905 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Sauropsida | Reptilia | Eureptilia | Romeriida | Diapsida | Archosauromorpha | Crocopoda | Archosauriformes | Eucrocopoda | Archosauria | Avemetatarsalia | Ornithodira | Dinosauromorpha | Dinosauriformes | Dinosauria | Saurischia | Theropoda | Neotheropoda | Averostra | Tetanurae | Coelurosauria | Tyrannosauroidea | Tyrannosauridae | Tyrannosaurinae | Tyrannosaurus | Tyrannosaurus rex PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:135358 | PBDB:36322 | PBDB:92204 | PBDB:99771 | PBDB:37768 | PBDB:38182 | PBDB:347446 | PBDB:57091 | PBDB:347458 | PBDB:38215 | PBDB:144376 | PBDB:57250 | PBDB:53366 | PBDB:53207 | PBDB:52775 | PBDB:38505 | PBDB:38513 | PBDB:56397 | PBDB:92043 | PBDB:53374 | PBDB:53940 | PBDB:58837 | PBDB:38606 | PBDB:65352 | PBDB:38613 | PBDB:54833 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | unranked clade | class | subclass | unranked clade | subclass | infraclass | unranked clade | unranked clade | unranked clade | order | informal | unranked clade | unranked clade | unranked clade | order | order | order | unranked clade | unranked clade | unranked clade | suborder | superfamily | family | subfamily | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=54833https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaleobiodb.org%2Fclassic%2FcheckTaxonInfo%3Ftaxon_no%3D54833&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369385621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pZw6bTTLEt%2F0UUocJXmZ72avXUM8%2FpdKo1ZmdhAIIIE%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fname-alignment-template%2Fissues%2F9%23issuecomment-1405666821&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369385621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ELaeS%2BbQn%2Fz%2BJVHSMZJlwIHZCdkuNqqpY4rFeoqloNM%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEVGYTHSKEG55VK4YKGJFEDWULTBBANCNFSM6AAAAAAUHZ2PAI&data=05%7C01%7Cmuhen%40gmu.edu%7C8b3898bbf6b14b0f2efa08daffe32a49%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103648369385621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6RZR2AYjKMSa3l7ftoWaGSlic65hL717%2BerP8nwL1j8%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jhpoelen commented 1 year ago

@markuhen thanks for sharing your examples.

using:

echo -e "\tDorudon atrox\n\tEocetus schweinfurthi\n\tSophianacetus commenticus\n"\
 | nomer append --properties my.properties --include-header pbdb\
 | mlr --itsvlite --omd cat

I generated the following table

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl resolvedCatalogName
Dorudon atrox HAS_ACCEPTED_NAME PBDB:53288 Dorudon atrox C. W. Andrews 1906 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Synapsida | Eupelycosauria | Therapsida | Cynodontia | Epicynodontia | Eucynodontia | Probainognathia | Mammaliamorpha | Mammaliaformes | Mammalia | Theriimorpha | Theriiformes | Trechnotheria | Cladotheria | Zatheria | Tribosphenida | Theria | Eutheria | Placentalia | Boreoeutheria | Laurasiatheria | Scrotifera | Euungulata | Artiodactylamorpha | Artiodactyla | Cetacea | Pelagiceti | Basilosauridae | Dorudon | Dorudon atrox PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:38882 | PBDB:91793 | PBDB:38935 | PBDB:39168 | PBDB:67366 | PBDB:39183 | PBDB:67452 | PBDB:67455 | PBDB:67456 | PBDB:36651 | PBDB:108999 | PBDB:137608 | PBDB:64369 | PBDB:57710 | PBDB:155048 | PBDB:97116 | PBDB:39860 | PBDB:182911 | PBDB:91965 | PBDB:192313 | PBDB:92585 | PBDB:192307 | PBDB:192312 | PBDB:159627 | PBDB:87634 | PBDB:36652 | PBDB:134057 | PBDB:42936 | PBDB:36709 | PBDB:53288 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | subclass | suborder | superorder | family | unranked clade | infraorder | unranked clade | unranked clade | unranked clade | class | order | order | unranked clade | unranked clade | unranked clade | infraclass | subclass | subclass | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | order | unranked clade | family | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=53288
Eocetus schweinfurthi HAS_ACCEPTED_NAME PBDB:53994 Eocetus schweinfurthi E. Fraas 1904 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Synapsida | Eupelycosauria | Therapsida | Cynodontia | Epicynodontia | Eucynodontia | Probainognathia | Mammaliamorpha | Mammaliaformes | Mammalia | Theriimorpha | Theriiformes | Trechnotheria | Cladotheria | Zatheria | Tribosphenida | Theria | Eutheria | Placentalia | Boreoeutheria | Laurasiatheria | Scrotifera | Euungulata | Artiodactylamorpha | Artiodactyla | Cetacea | Protocetidae | Eocetus | Eocetus schweinfurthi PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:38882 | PBDB:91793 | PBDB:38935 | PBDB:39168 | PBDB:67366 | PBDB:39183 | PBDB:67452 | PBDB:67455 | PBDB:67456 | PBDB:36651 | PBDB:108999 | PBDB:137608 | PBDB:64369 | PBDB:57710 | PBDB:155048 | PBDB:97116 | PBDB:39860 | PBDB:182911 | PBDB:91965 | PBDB:192313 | PBDB:92585 | PBDB:192307 | PBDB:192312 | PBDB:159627 | PBDB:87634 | PBDB:36652 | PBDB:42934 | PBDB:36711 | PBDB:53994 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | subclass | suborder | superorder | family | unranked clade | infraorder | unranked clade | unranked clade | unranked clade | class | order | order | unranked clade | unranked clade | unranked clade | infraclass | subclass | subclass | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | order | family | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=53994
Sophianacetus commenticus NONE Sophianacetus commenticus

Note that synonyms are related to to their accepted names, but not vise versa.

jhpoelen commented 1 year ago

@markuhen @ljwalker please do share what you expected the results to be in your example.

ljwalker commented 1 year ago

This output is generally aligned with the results I would expect, without being knowledgeable of the taxa Mark suggested.

markuhen commented 1 year ago

Jorrit,

These look good, except for Sophianacetus. Maybe I included a typo. Here is the basonymn: Mediocris commenticiushttps://paleobiodb.org/classic/checkTaxonInfo?taxon_no=71649&is_real_user=1. It was later recombined into Sophianacetus commenticus.

I just wanted to see if the names got resolved properly. Maybe try Mediocris commenticus and see if it resolves to Sophianacetus commenticus.

Thanks,

Mark

Mark D. Uhen Professor & Chair George Mason University AOES Geology MSN 6C5 Fairfax, VA 22030 Phone: 703-993-5264 Fax: 703-993-3535

From: Jorrit Poelen @.> Date: Thursday, January 26, 2023 at 5:03 PM To: globalbioticinteractions/name-alignment-template @.> Cc: Mark D Uhen @.>, Mention @.> Subject: Re: [globalbioticinteractions/name-alignment-template] add PBDB as an additional catalogue (Issue #9)

@markuhenhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkuhen&data=05%7C01%7Cmuhen%40gmu.edu%7C123b12a9220d47a5652f08daffe915bd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103673829314905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=E5BSSYBOUER96z9ojrZj80KthhymBrizVRPwG1lPhVA%3D&reserved=0 @ljwalkerhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fljwalker&data=05%7C01%7Cmuhen%40gmu.edu%7C123b12a9220d47a5652f08daffe915bd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103673829314905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m%2FwWDUOyKiGQlUw0QPhLb9Fm%2BPTVVkdtRwlRDtzLcgg%3D&reserved=0 please do share what you expected the results to be in your example.

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fname-alignment-template%2Fissues%2F9%23issuecomment-1405728992&data=05%7C01%7Cmuhen%40gmu.edu%7C123b12a9220d47a5652f08daffe915bd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103673829314905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JZ%2BCxBlhpt8K0JmoRivAPHH%2Fy5lmLVh1VgVDY7cc0o8%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEVGYTEKL35ENSYYRDCI3J3WULYBDANCNFSM6AAAAAAUHZ2PAI&data=05%7C01%7Cmuhen%40gmu.edu%7C123b12a9220d47a5652f08daffe915bd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103673829314905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nudKISqOARoq%2FXVsrccPf9cWfmKJxMaI%2BGrqqMzVjtY%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jhpoelen commented 1 year ago

Thanks for sharing!

via

echo -e "\tMediocris commenticius"\
 | nomer append pbdb --include-header\
 | mlr --itsvlite --omd cat 

I found:

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl
Mediocris commenticius HAS_ACCEPTED_NAME PBDB:71649 Mediocris commenticius E. Kazár 2005 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Synapsida | Eupelycosauria | Therapsida | Cynodontia | Epicynodontia | Eucynodontia | Probainognathia | Mammaliamorpha | Mammaliaformes | Mammalia | Theriimorpha | Theriiformes | Trechnotheria | Cladotheria | Zatheria | Tribosphenida | Theria | Eutheria | Placentalia | Boreoeutheria | Laurasiatheria | Scrotifera | Euungulata | Artiodactylamorpha | Artiodactyla | Cetacea | Pelagiceti | Neoceti | Odontoceti | Delphinida | Kentriodontidae | Pithanodelphinae | Sophianacetus | Mediocris commenticius PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:38882 | PBDB:91793 | PBDB:38935 | PBDB:39168 | PBDB:67366 | PBDB:39183 | PBDB:67452 | PBDB:67455 | PBDB:67456 | PBDB:36651 | PBDB:108999 | PBDB:137608 | PBDB:64369 | PBDB:57710 | PBDB:155048 | PBDB:97116 | PBDB:39860 | PBDB:182911 | PBDB:91965 | PBDB:192313 | PBDB:92585 | PBDB:192307 | PBDB:192312 | PBDB:159627 | PBDB:87634 | PBDB:36652 | PBDB:134057 | PBDB:63145 | PBDB:42937 | PBDB:63496 | PBDB:42938 | PBDB:63523 | PBDB:82590 | PBDB:71649 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | subclass | suborder | superorder | family | unranked clade | infraorder | unranked clade | unranked clade | unranked clade | class | order | order | unranked clade | unranked clade | unranked clade | infraclass | subclass | subclass | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | order | unranked clade | unranked clade | suborder | infraorder | family | subfamily | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=71649

and, unexpectedly, the url associated with the accepted taxon id: https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=71649

resolve to your expected "Sophianacetus commenticius" (see screenshot below)

However, Nomer produced Mediocris commenticius instead.

Digging into the source data, I found

preston cat 'line:hash://sha256/0177efc9dfa3828224533cbc8c6567a8d8b325c8430a25c756a42f712aa5b0d7!/L1,L80523'\
 | mlr --itsvlite --omd cat 
orig_no taxon_no record_type flags taxon_rank taxon_name difference accepted_no accepted_rank accepted_name parent_no reference_no is_extant n_occs
71649 71649 txn V species Mediocris commenticius recombined as 71649 species Sophianacetus commenticius 82590 16675 extinct 1

where hash://sha256/0177efc9dfa3828224533cbc8c6567a8d8b325c8430a25c756a42f712aa5b0d7 is the content identifier for the alias https://paleobiodb.org/data1.2/taxa/list.tsv?all_records . and L1 selects the header, and L80523 selects the line in the acquired file with the name Mediocris commenticius in it.

And note that the id for the provided taxon 71649, is the same as the name for the "recombined" species Sophianacetus commenticius . As Nomer assumes that only a single id is assigned to a name, the first definition is chosen.

Is this expected that the two names are assigned to the same id? What unique identifier should I use instead to distinguish provided name (e.g., Mediocris commenticius) to resolved name Sophianacetus commenticius .

image

jhpoelen commented 1 year ago

Nomer v0.4.9 should now support PBDB . I am sure there's some bugs in it, so please do let me know about your (unexpected) results in using the name alignment tool.

Here's some examples:

https://github.com/ljwalker/name-alignment-tool/pull/1

and

https://github.com/jhpoelen/name-alignment-pbdb

markuhen commented 1 year ago

Joritt,

I’m not 100% sure, but I think the taxonID number is the actual taxon, the thing we are trying to name. So, since both Mediocris commenticus and Sophianacetus commenticus are names for the same thing, it makes sense for the both to map to the same taxonID. It’s just the Mediocris commenticus is a junior subjective synonym of Sophianacetus commenticus because the genus name Mediocris was preoccupied by another taxon.

Thanks,

Mark

-- Mark D. Uhen Professor & Chair George Mason University AOES Geology MSN 6C5 Fairfax, VA 22030 Phone: 703-993-5264 Fax: 703-993-3535

From: Jorrit Poelen @.> Date: Thursday, January 26, 2023 at 5:43 PM To: globalbioticinteractions/name-alignment-template @.> Cc: Mark D Uhen @.>, Mention @.> Subject: Re: [globalbioticinteractions/name-alignment-template] add PBDB as an additional catalogue (Issue #9)

Thanks for sharing!

via

echo -e "\tMediocris commenticius"\

| nomer append pbdb --include-header\

| mlr --icsv --omd cat

I found: providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl Mediocris commenticius HAS_ACCEPTED_NAME PBDB:71649 Mediocris commenticius E. Kazár 2005 species Life | Eukaryota | Opisthokonta | Animalia | Bilateria | Eubilateria | Deuterostomia | Chordata | Vertebrata | Gnathostomata | Osteichthyes | Sarcopterygii | Dipnotetrapodomorpha | Tetrapodomorpha | Tetrapoda | Reptiliomorpha | Anthracosauria | Amphibiosauria | Cotylosauria | Amniota | Synapsida | Eupelycosauria | Therapsida | Cynodontia | Epicynodontia | Eucynodontia | Probainognathia | Mammaliamorpha | Mammaliaformes | Mammalia | Theriimorpha | Theriiformes | Trechnotheria | Cladotheria | Zatheria | Tribosphenida | Theria | Eutheria | Placentalia | Boreoeutheria | Laurasiatheria | Scrotifera | Euungulata | Artiodactylamorpha | Artiodactyla | Cetacea | Pelagiceti | Neoceti | Odontoceti | Delphinida | Kentriodontidae | Pithanodelphinae | Sophianacetus | Mediocris commenticius PBDB:28595 | PBDB:1 | PBDB:212579 | PBDB:67091 | PBDB:67103 | PBDB:272902 | PBDB:67145 | PBDB:33815 | PBDB:67149 | PBDB:67344 | PBDB:34881 | PBDB:67348 | PBDB:219195 | PBDB:77135 | PBDB:53190 | PBDB:125547 | PBDB:37177 | PBDB:465406 | PBDB:56749 | PBDB:53189 | PBDB:38882 | PBDB:91793 | PBDB:38935 | PBDB:39168 | PBDB:67366 | PBDB:39183 | PBDB:67452 | PBDB:67455 | PBDB:67456 | PBDB:36651 | PBDB:108999 | PBDB:137608 | PBDB:64369 | PBDB:57710 | PBDB:155048 | PBDB:97116 | PBDB:39860 | PBDB:182911 | PBDB:91965 | PBDB:192313 | PBDB:92585 | PBDB:192307 | PBDB:192312 | PBDB:159627 | PBDB:87634 | PBDB:36652 | PBDB:134057 | PBDB:63145 | PBDB:42937 | PBDB:63496 | PBDB:42938 | PBDB:63523 | PBDB:82590 | PBDB:71649 unranked clade | kingdom | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | phylum | subphylum | superclass | class | unranked clade | subclass | unranked clade | unranked clade | unranked clade | unranked clade | subclass | suborder | unranked clade | subclass | suborder | superorder | family | unranked clade | infraorder | unranked clade | unranked clade | unranked clade | class | order | order | unranked clade | unranked clade | unranked clade | infraclass | subclass | subclass | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | unranked clade | order | unranked clade | unranked clade | suborder | infraorder | family | subfamily | genus | species https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=71649https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaleobiodb.org%2Fclassic%2FcheckTaxonInfo%3Ftaxon_no%3D71649&data=05%7C01%7Cmuhen%40gmu.edu%7Cfcc885a63d0548b8973508daffeeb457%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103697938494413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3EYsNUKXAd5pDEEDMvC2au5uNl%2F2DVohivyNHPPnkUc%3D&reserved=0

and, unexpectedly, the url associated with the accepted taxon id: https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=71649https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaleobiodb.org%2Fclassic%2FcheckTaxonInfo%3Ftaxon_no%3D71649&data=05%7C01%7Cmuhen%40gmu.edu%7Cfcc885a63d0548b8973508daffeeb457%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103697938494413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3EYsNUKXAd5pDEEDMvC2au5uNl%2F2DVohivyNHPPnkUc%3D&reserved=0

resolve to your expected "Sophianacetus commenticius" (see screenshot below)

However, Nomer produced Mediocris commenticius instead.

Digging into the source data, I found

$ preston cat hash://sha256/0177efc9dfa3828224533cbc8c6567a8d8b325c8430a25c756a42f712aa5b0d7 | grep -n "Mediocris commenticius" 80523:71649 71649 txn V species Mediocris commenticius recombined as 71649 species Sophianacetus commenticius 82590 16675 extinct1

And note that the id for the provided taxon 71649, is the same as the name for the "recombined" species Sophianacetus commenticius . As Nomer assumes that only a single id is assigned to a name, the first definition is chosen.

Is this expected that the two names are assigned to the same id? What unique identifier should I use instead to distinguish provided name (e.g., Mediocris commenticius) to resolved name Sophianacetus commenticius .

[image]https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F1084872%2F214966331-46f30973-c49e-43aa-86ff-369c209c3bdf.png&data=05%7C01%7Cmuhen%40gmu.edu%7Cfcc885a63d0548b8973508daffeeb457%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103697938494413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9da4QPY6GB%2BZ6xhKNFxlfyvrQEFmMBeWXHwlFOIxEhk%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fname-alignment-template%2Fissues%2F9%23issuecomment-1405769882&data=05%7C01%7Cmuhen%40gmu.edu%7Cfcc885a63d0548b8973508daffeeb457%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103697938494413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QJACkjuU3D5PivGsHD5%2FBYGx8YBTtNRqSyIM5rSpGDg%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEVGYTGYMWKQJFW7L466SYTWUL4X5ANCNFSM6AAAAAAUHZ2PAI&data=05%7C01%7Cmuhen%40gmu.edu%7Cfcc885a63d0548b8973508daffeeb457%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638103697938494413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZovVxneTWq12HDQx8YX13%2BkMh%2B%2F5B%2BL9J%2BPgesDp5nA%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jhpoelen commented 1 year ago

@markuhen distinguishing between a name and a taxon make sense to me. What would be a reliable way to reference names ? Name and authorship? Or do you have another sort of name identifier.

Thanks for taking the time to help me understand the Paleobiology Database - must be quite a treat to help maintain a resource like this!

jhpoelen commented 1 year ago

@ljwalker I am sure that your idea to add PBDB to the name alignment template might stir up some conversations.

Would it be an idea to chat about this during the Paleo happy hour instead of lengthy github issue threads?

Let me know what your preferred way of communicating is. Happy to stick with github issues, and open to alternatives.

markuhen commented 1 year ago

All,

I am happy to Zoom in if you need som PBDB insight.

Thanks,

Mark

-- Mark D. Uhen Professor & Chair George Mason University AOES Geology MSN 6C5 Fairfax, VA 22030 Phone: 703-993-5264 Fax: 703-993-3535

From: Jorrit Poelen @.> Date: Friday, January 27, 2023 at 9:52 AM To: globalbioticinteractions/name-alignment-template @.> Cc: Mark D Uhen @.>, Mention @.> Subject: Re: [globalbioticinteractions/name-alignment-template] add PBDB as an additional catalogue (Issue #9)

@markuhenhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkuhen&data=05%7C01%7Cmuhen%40gmu.edu%7C8c120bf7513a428d5aa708db00762ab0%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638104279741734371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8bQpzpwsbME7BaQq2KZHUVzQ5cBXaeMNXnnml1M3bcY%3D&reserved=0 distinguishing between a name and a taxon make sense to me. What would be a reliable way to reference names ? Name and authorship? Or do you have another sort of name identifier.

Thanks for taking the time to help me understand the Paleobiology Database - must be quite a treat to help maintain a resource like this!

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fglobalbioticinteractions%2Fname-alignment-template%2Fissues%2F9%23issuecomment-1406613045&data=05%7C01%7Cmuhen%40gmu.edu%7C8c120bf7513a428d5aa708db00762ab0%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638104279741734371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DjMOLMcQI0fbuSdbPzHQRMSPz6pJV%2FifwDUJETPGMAw%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEVGYTHUGT244P2NJKK32CTWUPOMHANCNFSM6AAAAAAUHZ2PAI&data=05%7C01%7Cmuhen%40gmu.edu%7C8c120bf7513a428d5aa708db00762ab0%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C638104279741734371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3YmuN4GycIbozuR8xb57DSsRGsI%2BwZ2fUHsKWa4ll1k%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jhpoelen commented 1 year ago

@ljwalker Thanks for sharing reference to Holly Little's TDWG presentation in our video chat today:

Little H, Byrd C, Karim T, Krimmel E, Norton B (2022) Extinct Taxa in an Extant World: Working towards better fossil taxonomic representation. Biodiversity Information Science and Standards 6: e94417. https://doi.org/10.3897/biss.6.94417

I added some of the name examples to my list of pbdb relevant names at:

https://github.com/jhpoelen/name-alignment-pbdb

https://github.com/jhpoelen/name-alignment-pbdb/blob/main/names.csv (see list below)

also, I've enabled the taxonomic schemas that were mentioned in the talk: Catalogue of Life, GBIF, Paliobiology Database.

Hoping to add WoRMS as soon as their resource gets published as a whole dataset . Until them. resolving names via WoRMS API works for smaller name lists due to network overhead and API response times.

With added examples and context, I got a better idea of what Paleo folks are looking for. But . . . I am sure there's more to it than what I currently know (or think I know).

Chondrichthyes
Elasmobranchii
Wiwaxia currugata (Matthew, 1889)
Tipuloidae
Ammonoidea
Eukaryota
Opisthokonta
Animalia
Bilateria
Eubilateria
Deuterostomia
Chordata
Vertebrata
Gnathostomata
Osteichthyes
Sarcopterygii
Dipnotetrapodomorpha
Tetrapodomorpha
Tetrapoda
Reptiliomorpha
Anthracosauria
Amphibiosauria
Cotylosauria
Amniota
Synapsida
Eupelycosauria
Edaphosauridae
Edaphosaurus
jhpoelen commented 1 year ago

A first pass at supporting PBDB through the Name Alignment Template (see https://github.com/globalbioticinteractions/name-alignment-template) has been added.

Please open new issues for ideas to improve and/or fix.