EOL / ContentImport

A placeholder for DATA tickets everytime Jira is un-available.
1 stars 1 forks source link

review GBIF services for checklists and maps #24

Open jhammock opened 4 days ago

jhammock commented 4 days ago

A user has suggested we look into this service: https://techdocs.gbif.org/en/data-use/data-cubes

but feel free to review any other options that strike your fancy. Both checklists and map data can, I think, now benefit from new data filters that were not previously available. Some of these, using GBIF standard data quality fields, are mentioned in the documentation, but our correspondent suggests that others, using eg: the basisofrecord field, could prevent cases like the North American points displayed in the map on this page. I'm not sure it's as simple as that, but possibly this new feature could facilitate our customary practice of slapping a filter on a resource where a pattern of misleading data becomes evident.

If any other reorganization of the checklist resources would be useful, this is probably a good time to consider that too. We may decide to eliminate or more severely filter checklists for which alternate data sources are plentiful. Small checklists could be aggregated if that facilitates harvesting.

eliagbayani commented 4 days ago

@jhammock Regarding the National Checklist resources we have in EOL We actually had two datasets:

  1. https://opendata.eol.org/dataset/nationalchecklists
  2. https://opendata.eol.org/dataset/national-checklists-2019

Explored how we came up with these resources. Found an interesting source but will need to double check still. It seems resources from no.1 came from the Fresh Data project, which is this one: http://gimmefreshdata.github.io/about.html These DwCA files (.zip) were uploaded to CKAN (opendata.eol.org). And the resources from no.2 were generated using the resources from no.1 where we applied some filter, remapping and cleaning of some sort. These DwCA files (.tar.gz) were generated using a script connector. Resources from no.1 were generated in 2017. Resources from no.2 were generated in 2019.

eliagbayani commented 4 days ago

Same with the Water Body Checklists, we have two datasets:

  1. https://opendata.eol.org/dataset/water-body-checklists
  2. https://opendata.eol.org/dataset/water-body-checklists-2019

Same as above. no.1 came from the Fresh Data project. And the resources from no.2 were generated using the resources from no.1 where we applied some filter, remapping and cleaning of some sort. Resources from no.1 were generated in 2018. Resources from no.2 were generated in 2019.

So in order to update these resources and since FreshData is no longer available. We may want to use GBIF's country checklists. e.g. https://www.gbif.org/country/BR/summary Brazil https://www.gbif.org/country/PH/summary Philippines And/or the new filtering schemes offered by the species occurrence cubes. Which looks promising.

jhammock commented 4 days ago

ooops- well, for starters, that means we've marked the newer category of checklist "deprecated" in zenodo- I'll address that in one of the zenodo threads. For now, yes, it's time to switch to a GBIF service. I thought maybe we already had, but I must be thinking of the gbif maps.

I'm not inclined to use the country checklists unfiltered. Poking around the Brazil example, it appears to be a simple query using not even the basic quality control filters, nor specifying occurrence status=present, rather than absent.

I'd say that we could either leverage the filters in the data cube service, or, probably get much the same effect by filtering our chosen fields and values downstream. @eliagbayani , why don't you choose whichever method seems more practical, and let us know any constraints there may be on what we can filter on. Katja and I can get together and come up with a first draft of filters. If you can describe what we're already doing in the 2019 dataset, that's probably worth doing again.

I wouldn't interrupt anything already in flight for this. I'll put it in the queue after the TreatmentBank business.

Thanks!