gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Filtering out records with issues #2672

Open ManonGros opened 4 years ago

ManonGros commented 4 years ago

I know that on the long term, we want to rethink how we present the issues and flags in our occurrence search interface. We will have to prioritise the issues and rearrange our filters and it is a lot of work.

That being said, in the meanwhile we could start by offering the option of filtering out the records with issues without necessarily changing the interface much. A lot of users already seem to think that this is what our filters do anyway. See for example, the paper associated with this download:

Species-specific geographical information was based on occurrence data available through the Global Biodiversity Information Facility (GBIF; http://www.gbif.org/). To this end, more than 113 million occurrence data points available in the Plantae kingdom were downloaded (20 October 2016) and processed locally, using only angiosperm data. These occurrence data were subject to filtering criteria before their download (for example, certain invalid, unlikely and mismatch issues as defined by GBIF; a full list of the issues omitted can be found in Supplementary Text 3).

  1. Could the API start supporting search of records without issues? [Edit: the download API has now a "not" predicate which allows to filter records without a given set of issues, see this example]
  2. Could the Issues and flags menu offer the option to filter out records with issues (instead of selecting the records with the issues)?

For now (before the issue and flag rethinking), the only changes for (2) could be to change our text to "exclude ..." For example: "exclude zero coordinates", "exclude coordinate invalid", etc. I know that some publishers want to select records that have issues (for corrections). I guess that in this scenario, we would remove that option (at least in the simple interface). This would be a short-term solution but unless we can make time to reorganise the flags and rethink the interface soon, this would be better than what we have.

Related issue: https://github.com/gbif/portal-feedback/issues/49

albenson-usgs commented 4 years ago

As someone on both sides of this issue, I actually think I prefer being able to select the occurrences with issues (although it would be better if selecting them actually showed you where the issue is instead of having to download the records to find out). Filtering out the records with issues is a simple step in the analysis process: dat <- GBIFSpeciesData %>% occ_issues(-cucdmis, -rdatm, -zerocd, -iddatunl)

and I'm going to be further working on the data in code anyway. I don't think it saves the user that much time or effort. However, having to download a dataset I've published back out of GBIF to look for issues and flags would make it more cumbersome versus just doing a quick check on the dataset page.

MattBlissett commented 4 years ago

On an individual occurrence page, you can click the ⇄ button at the top of the table to compare verbatim and interpreted values, although there's nothing that allows this to be done in bulk.

When the user made this download, they would have seen the huge block of yellow showing how many records had issues (all of them!) image

Until the API supports filtering to exclude issues, I don't think there's much we can do with the filter interface. Possibly the issues-and-flags section could be moved to "Advanced".

dagendresen commented 3 years ago

Maybe add a tick-box for all records without any issue is possible -- so that users can select this and then add on any issues they "accept" (but yes, would probably need a bit of developer time).