CatalogueOfLife / data

Repository for COL content
8 stars 2 forks source link

Bad extinct flags #696

Open aoern opened 2 months ago

aoern commented 2 months ago

There are a lot of extinct taxa that either have a wrong 'extinct' property value or have not defined the value at all.

There are two rules about extinct taxa that can be easily checked automatically:

  1. If a taxon is extinct, all its descendants are extinct as well.
  2. If all the descendants of a taxon are extinct, the taxon is extinct as well.

I wrote a simple program to do these checks for the whole database. Rule 1 was violated by 5315 taxa. 470 of descendant taxa had extinct value 'false' and 4845 had no value for the property. Rule 2 was violated by 1974 taxa. 346 of parent taxa had extinct value 'false' and 1628 had no value for the property. So, total of 7289 taxa have a wrong value of 'extinct' property.

I wonder if these checks could be of any help when importing data into CoL database.

mdoering commented 2 months ago

Yes, it is a long standing open issue to programmatically set extinct during syncs. See https://github.com/CatalogueOfLife/backend/issues/289 and https://github.com/CatalogueOfLife/backend/issues/1024

mdoering commented 2 months ago

And it is a great idea to use your rules to flag issues for those taxa that violate them! Opened a ticket: https://github.com/CatalogueOfLife/backend/issues/1349