CatalogueOfLife / data

Repository for COL content
8 stars 2 forks source link

Zoobank - scientificNameAuthorship empty values filled with the string: <Unspecified Agent> #711

Open camiplata opened 1 year ago

camiplata commented 1 year ago

dwc:scientificNameAuthorship has empty values as: "< Unspecified Agent >", 46902 records (11%) are affected by it, 43299 genus, 173 species, 1 subspecies.

See: https://www.checklistbank.org/dataset/2037/taxon/bbb21db2-66ac-47ea-836f-805854c026d8

mdoering commented 1 year ago

There are also many copies of the same names, e.g. Doradidae: https://www.checklistbank.org/dataset/2037/name/e47fc3b7-d540-4b5a-b1ec-03680f259d42

mdoering commented 1 year ago

@deepreef any chance to remove the unspecified agent from the export? There are also other bad names that use [none] or odd accordingTo values like sensu 1999.… 1999:

mdoering commented 1 year ago

see also https://github.com/CatalogueOfLife/xcol/issues/34 and https://github.com/CatalogueOfLife/xcol/issues/32

deepreef commented 1 year ago

Many thanks for pointing these out. There is a LOT of clean-up we need to do, and that will be included in the grant proposal later this year. But if there are particularly problematic records, like the ones above, I can probably set aside some time next weekend to clean them up.

I'm not sure exactly what you mean by removing the Unspecified Agent from the export -- do you mean in terms of that agent as an author of exported publications?

camiplata commented 1 year ago

yes @deepreef, we mean to remove Unspecified Agent as an author to avoid combinations such as:

Captura de pantalla 2023-07-31 a la(s) 8 33 08 a m
deepreef commented 1 year ago

OK, I can change the way the author is presented when not known. This is a tag we use to indicate that we have not tracked down the original publication. But if it would be better to present a Null value in the output, I can certainly do that.

camiplata commented 2 weeks ago

Dear @deepreef, we would like to know if it is possible to remove the placeholder in the Authorship in the next version of Zoobank dataset. If needed we can provide assistance with these changes.

deepreef commented 2 weeks ago

Yes! But... we're still in the process of migrating our web server to a much more robust infrastructure. This was supposed to have been completed last year, but there have been delays. The IPT needs to be re-established on the server that currently holds the live database. It's on the to-do list, but not sure exactly when it will happen. I will make sure the " is removed from Authorship in the next IPT export -- just not sure when that will happen.

camiplata commented 1 day ago

Thanks for your reply @deepreef. All the best for the migration