EOL / publishing

This repository contains rails code for the new version of EOL (TRAMEA).
Other
26 stars 7 forks source link

'scientific names' found in 'canonical' property in graphdb #29

Open jar398 opened 5 years ago

jar398 commented 5 years ago

I assume the 'canonical' property of a Page node in the graphdb is what GBIF calls 'canonicalName' and is meant to be a 'scientificName' (sensu TDWG) stripped of authority and year information. This assumption is based on the 'canonical' properties that I've examined, nearly all of which are 'canonical names'.

However there are a few 'canonical' properties that have 'scientificNames' as values. I believe these to be erroneous. I don't know what part of the system is responsible, but they shouldn't end up in the graphdb.

Here are some examples (with a few correct values for comparison):

page_id,parent_id,canonical
4524441,4523943,Copelemur tutus
4524442,4523943,Copelemur australotutus
4524447,4524417,"Notharctus Leidy, 1870"
44132859,4524447,"Notharctus crassus (Marsh, 1872)"
4524452,4524447,"Notharctus robustior (Leidy, 1872)"
44074592,4524447,Notharctus limosus Gazin
4524456,4524447,"Notharctus robinsoni Gingerich, 1979"
4524455,4524447,"Notharctus venticolus Osborn, 1902"
47049460,4524447,"Notharctus anceps (Marsh, 1872)"
...
1037705,4467384,"Notocitellus adocetus (Merriam, 1903)"
39042460,1037705,"Notocitellus adocetus subsp. infernatus (Alvarez & Ram?-rez-P., 1968)"
39042461,1037705,"Notocitellus adocetus subsp. arceliae (Villa-R., 1942)"
28539864,1037705,Notocitellus adocetus subsp. adocetus

Reported by Ray Ma.

JRice commented 5 years ago

There was a brief problem with the tool that parses these names; it's likely that upon re-publishing of those resource, this should resolve. It's worth checking on the resources these are associated with, though, to ensure they have been handled properly (or to make note that they need re-processing)...

jhammock commented 5 years ago

It looks like the resource underlying this issue is the Dynamic Hierarchy. Does that harvest square with the timing of the GNP problem? (Last summer, I think? The datestamps on the resource are confusing.)