gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Very incorrect hundred of birds species included for Nigeria (Lagos city specifically) #3529

Open tapirus07 opened 3 years ago

tapirus07 commented 3 years ago

After to download bird species for some cities in the world, I noted that Lagos, NG has over 900 birds species (what it is completelly impossible for an urban area). Going deep, I noted that about 600 species were recorded just once, and all them are completely out of their geographical ranges (endemics from North America, South America, Australia, Europe). I also noted all these crazy records are from one unique day (10-FEB-2018), from an unique location (6.4N, 3.3E) and under contribution of "The Nigerian Conservation Foundation (NCF)".

Or someone introduced over 600 birds species from all over world in a wetland in Nigeria, or this dataset need to be corrected ASAP. By the way for this same area (Lagos, NG), EBIRD records 294 species, and AVIBASE expects 500 species as the maximum possible (given bird species geographical ranges). In other words, NCF has provided very bad data.

Thanks!

jlegind commented 3 years ago

Thank you for the report. We will investigate.

jlegind commented 3 years ago

The dataset contact has kindly responded:

Thanks very much for your feedback. There is obviously a mix up in the dataset with the suggested species. We will work to pull that dataset down and review it with the correct entries. Kindly walk us through the process of reviewing the dataset please. For the question of 900 species in Lagos, these species are not recorded for Lagos alone but for Nigeria. Lagos is identified as the administrative location even though some species were actually identified in some wetlands in Lagos.

Joseph

Director, Technical Programmes Department Nigerian Conservation Foundation

tapirus07 commented 3 years ago

Thanks for your attention. What I meant was that rgbif dataset have hundreds of birds species (from all parts of the world, and not native from Nigeria) attributed to ONE unique location WITHIN LAGOS CITY. This information was provided by your foundation. So I am telling you that according to your data we have over 900 bird species within Lagos city, not entire Nigeria. This is impossible! If you get all birds records from Nigeria, and then filter just those records (with geographical coordinates) that fall within admnistrative borders of Lagos city, you will have over 900 species. Hundreds of these species (recorded within Lagos city, especifically at this wetland in Lagos in 6.4N, 3.3E on 10-FEB-2018) are endemic species from other continents; they do not even occur in Africa continent. You need to look why the data you provided tell us that dozens of Amazon birds were "human observed" at this location within Lagos city. This is just an example.

Beyond that, rgbif users expect that geographical coordinates linked to species records are the exact point of observation (as fair as possible), not an administrative random location. If you do not have real coordinates, leave this information empty, otherwise these data will mess up many scientific studies that rely on the accuracy of geographical coordinates. Just attribute the record to (country) Nigeria, or the cities name, but do not put coordinates if you do not have them accurately.

Unfortunately it is too many bad records to help you (I mean over 600 species badly informed in an unique location), so I cannot help you now. My suggestion is to exclude all information from location [6.4N, 3.3E] and day 10-FEB-2018, from rgbif; This is clearly very suspicious data.

jlegind commented 3 years ago

Dear tapirus07,

I see you are employing the rgbif package; may I ask if you are using the _occdata() function, or the _occdownload() function? For larger occurrence downloads, I would urge you to use the occ_download function (it creates a user-download) because it has no record limit (unlike occ_data which relies on page by page scraping of the API response). The occ_download function gives a fuller experience and supplies a convenient DOI for citation. You will need to register as a GBIF user for this though. Such a download will persist in the GBIF system for six months with the option of extending this period. For a large and complex download, such a search will execute much faster than the occ_data version.

jlegind commented 3 years ago

Dear Tapirus07,

The dataset has been taken down for now. We are eagerly awaiting the publishers investigation of this issue.

Regards,

Jan K. Legind Data manager, GBIF