gbif / backbone-feedback

2 stars 0 forks source link

Huge amounts of data with not even a kingdom #479

Open gbif-portal opened 7 years ago

gbif-portal commented 7 years ago

Huge amounts of data with not even a kingdom

When you explore the records within this, there are a large number of seemingly good scientific names which need to be added or synonymised in the backbone. The map shows a good geographic coverage too.

This seems like it might prove to be a low cost area for decent improvement


fbitem-species0 Reported by: @timrobertson100 System: Chrome 60.0.3112 / Mac OS X 10.11.5 Referer: https://www.gbif.org/species/0 Window size: width 1804 - height 895 API log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499%20AND%20(request:%22%2F%2Fapi.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram) Site log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E399%20AND%20(request:%22%2F%2Fdemo.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram)

mdoering commented 7 years ago

Yes, there are 2 kinds of improvements here.

1) Assigning the non classified 13.380 taxa, especially the missing 191 families and maybe some of the 4200 genera, in the backbone: https://www.gbif.org/species/search?rank=SPECIES&highertaxon_key=0&status=ACCEPTED&status=DOUBTFUL

2) and getting in touch with the occurrence datasets that provide no scientific name whatsoever. E.g. like this bat dataset which are mammals: https://api.gbif.org/v1/occurrence/1586035241/verbatim

mdoering commented 7 years ago

191 unclassified families.txt

mdoering commented 7 years ago

The species with most occurrences seems to be a bird: https://www.gbif.org/species/4408583

coming from FauEu which has no ranks for the higher taxa: https://www.gbif.org/species/123258242

Created an issue for improving nub builds in such cases.

timrobertson100 commented 7 years ago

Perhaps we need to get helpdesk to approach those providers and get more complete data mapping too?

On Thu, Aug 24, 2017 at 11:09 AM, Markus Döring notifications@github.com wrote:

The species with most occurrences seems to be a bird: https://www.gbif.org/species/4408583

coming from FauEu which has no ranks for the higher taxa: https://www.gbif.org/species/123258242

Created an issue https://github.com/gbif/checklistbank/issues/37 for improving nub builds in such cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/backbone-feedback/issues/479, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOepamrUC1lcQf80ofFmS-0RXT3A6R9ks5sbT2-gaJpZM4PACjz .

mdoering commented 7 years ago

@gbif/content there are a few large datasets with many unmatched records: https://www.gbif.org/occurrence/datasets?taxon_key=0

Some of them should be easily fixed by adding at least a kingdom to the data, e.g. Plantae for the Oslo Vascular Plant Herbarium

ahahn-gbif commented 7 years ago

Thanks, we'll take a look and check with publishers.

ahahn-gbif commented 7 years ago

Contacted the four largest contributors by email, follow up with others later.

mdoering commented 6 years ago

Names and many higher ranks in Fauna Europae have just been fixed: gbif/checklistbank#43

mdoering commented 6 years ago

191 families with no kingdom left: https://www.gbif.org/species/search?rank=FAMILY&highertaxon_key=0

mdoering commented 6 years ago

there are still nearly 44 million occurrences in incertae sedis

timrobertson100 commented 6 years ago

What about opening a public spreadsheet with Kingdom to Family for everything unmatched and we crowd source it? Could add other families with missing intermediate ranks too.

MortenHofft commented 6 years ago

44? I only see 4.5 million https://www.gbif.org/occurrence/search?taxon_key=0 not counting individualCount :)

mdoering commented 6 years ago

4.5 looks much more reasonable, I like that :) I was only briefly looking at the 43,904,723 GEOREFERENCED RECORDS on the species page here which appears to be incorrect: https://www.gbif.org/species/0

MortenHofft commented 6 years ago

those 43 mil corresponds to what is in the map I assume: https://api.gbif.org/v2/map/occurrence/density/capabilities.json?taxonKey=0

mdoering commented 6 years ago

yes, but it should be less than the total occurrences. @MattBlissett any idea whats wrong with maps?

ahahn-gbif commented 2 years ago

contacted

ahahn-gbif commented 2 years ago

https://www.gbif.org/dataset/e2bc2f00-62f3-4fd4-b9f3-89c030bca07a confirmed under investigation (helpdesk)