gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Homo sapiens occurrences: remove / prevent indexing of any non-fossil records #489

Open ahahn-gbif opened 3 years ago

ahahn-gbif commented 3 years ago

Out of legal and ethical concerns, GBIF should not index or serve occurrences of Homo sapiens. Only possible exception: extinct taxa and fossil records (e.g. H. sapiens neanderthalensis), https://www.gbif.org/occurrence/taxonomy?basis_of_record=FOSSIL_SPECIMEN&taxon_key=2436436

In that sense, we need to

MattBlissett commented 3 years ago

Excluding preserved specimens (but keeping fossil specimens) would remove 1359 records from 56 datasets from 51 publishers. These seem relevant for biodiversity research (hominid zoology etc), presumably the reason the zoological museums retain the specimen, and their reason to database and share them.

Without these specimens, several museums would be unable to build a complete portal for their collections using our API.

Should the full criteria be speciesKey!=2436436 OR BoR=Fossil OR BoR=PreservedSpecimen? (Neanderthals etc would necessarily be fossil records, or ancient DNA, but I don't know what BoR we would suggest for that.)

ahahn-gbif commented 3 years ago

I agree that we will need to communicate this appropriately, since we let this regression slip by unaddressed for too long.

Arguments against maintaining BoR=PreservedSpecimen: Preserved specimens (unless mislabeled fossils) would by their nature be recent (about last 400 years, more likely last 200) materials, sampled from people. This moves out of the field of (early) hominid zoology, and into the risky area of having either a medical or a colonial past. There are some moral / ethical considerations around such samples in zoological collections, e.g. leading to requests for the return of ancestral remains.

It is not our call to evaluate or judge these cases, or advise collections how to handle this. However, GBIF's mission is not focused on support for anthropological studies. At the cost of losing a relatively minor number of records, we can return to a clear distinction of what biodiversity data includes/excludes at that line.

Regarding complete portals for museum collections via our API, we also exclude a number of other things (artifacts, cultural collections etc) that museums hold. I don't see that as a particularly strong argument for keeping these records.

Number check: I arrive at a count of 53,127 records that would be removed.