Open dnoesgaard opened 3 years ago
I think it is a good idea to add explanations and examples to the various fields and filters in the UI. We decided against it when writing the current UI - so adding it now requires some more work. In the UI for the hosted portals an "about section" is always available. But again, someone needs to write it.
I think it is a good idea to add explanations and examples to the various fields and filters in the UI. We decided against it when writing the current UI - so adding it now requires some more work. In the UI for the hosted portals an "about section" is always available. But again, someone needs to write it.
More than that, someone needs to read it. That will only happen when someone is diligent and doesn't get what they expect. If they have no idea what to expect, they won't question the result or how they go it. It is rare that a user would realize that adding more information (a filter that isn't actually needed) works against them. Saying something like "Build queries with only the filters that distinguish the desired results and no additional filters" isn't going to help much. It would also require examples, and then it starts to get long, and you lose a whole other set of users.
As @dnoesgaard pointed out you have the capacity to avoid this kind of issue in geographic queries, which are one of the two most common. I think it would be great if you took advantage of it. There is a question of how though. It's not always trivial to assign standardized geography, because errors in one level can be propagated (e.g., when someone does a naïve mapping of an administrative level where boundaries or names have changed. To do it really well even requires dates. That aside, I still think there is value for the user to pursue interpreted geography.
Here's another example: https://www.gbif.org/occurrence/download/0017395-200221144449610
Applying the continent=EUROPE filter yields 11 M records whereas a rough polygon of Europe yields ~five times that.
See also #2387 and https://github.com/gbif/parsers/issues/26
We could start parsing / filling in dwc:continent, which is probably a better result than explaining why the term isn't much use in a filter.
I'll comment further on that issue.
Is it possible to flag up the discrepancy between the number of records based on the continent filter and what one would get if they used a bounding box or spatial polygon for filtering? It would argue that the ability to filter by continent implies that those filters are thought to capture all records in that continent. I recently had to redo an analysis because I realised late in the day that filtering by continent caused me to miss many records
Continent is now interpreted, but state/province remains a potential problem.
We often see filters used incorrectly or unintentionally in papers citing GBIF data. A common example is to filter occurrences by Continent or State/province which often excludes many records that don't have values for these fields. Using a derived field instead, like GADM, produces a more accurate result.
Example: https://doi.org/10.3897/jhr.81.62634 cites this download https://doi.org/10.15468/dl.wghcks that has <1000 records of bees in the US state of Pennsylvania. Removing the Continent filter alone changes this number to >14,000 records.
Please add your thoughts :-)