EOL / deprecated_eol_php_code

Encyclopedia of Life
http://eol.org/
Other
5 stars 7 forks source link

Investigate/fix "taxa without parents" in the pbdb hierarchy #151

Open KatjaSchulz opened 8 years ago

KatjaSchulz commented 8 years ago

@eliagbayani : @JRice alerted us to the fact that the pbdb hierarchy has a lot of "root nodes" (370), i.e., hierarchy entries that don't have any parents. Since this is a continually updated, community curated hierarchy, it's expected that there might be a few species, genera, families, etc. dangling around the root until somebody takes mercy on them and gives them a parent. But I don't remember there being quite so many. A spot check of some of the affected taxa reveals that they do have parents at the source, e.g.: https://paleobiodb.org/data1.1/taxa/single.json?id=170202&show=attr https://paleobiodb.org/data1.1/taxa/single.json?id=282936&show=attr https://paleobiodb.org/data1.1/taxa/single.json?id=14054&show=attr

Is this just a matter of our data being old, and it will fix itself once we re-run the connector? Or is there something about these taxa that makes it difficult for our connector to catch their parents?

eliagbayani commented 8 years ago

Hi @KatjaSchulz, this one fell into the cracks. I'm investigating now.

eliagbayani commented 8 years ago

Hi @KatjaSchulz , I saw the problem. Many of the parent IDs (e.g. 100759, 282934, 170201) from their CSV dump don’t have its own taxon entry in the CSV. I now fixed this by using the API call (e.g. https://paleobiodb.org/data1.1/taxa/single.json?id=100759&show=attr) to get info for those taxa and include them in the EOL-generated taxon.tab. Latest EOL DWC-A for this resource has now been generated.

This resource has now been uploaded to the server, ready for harvesting. I have not set the resource to force-harvest. I cc @jhammock Jen , because I know there is a queue where resources are prioritized for harvesting.

Latest stats: taxon: 263815 vernacularname:en: 3911 measurementorfact: 1103650 occurrence 1103650

A copy of the archive file here.

jhammock commented 8 years ago

Thanks, Eli! Queued up.

KatjaSchulz commented 8 years ago

It's a big one, but it would be great to get it reharvested soon if we can.