Closed sat01a closed 8 months ago
Added to the board - cc @peggynewman
I believe this is in part related to this bug: https://github.com/gbif/pipelines/issues/793
This is fixed in this release of pipelines: https://github.com/gbif/pipelines/releases/tag/pipelines-parent-2.11.12.3
Thanks @djtfmartin , is the above version deployed to prod?
Only in databox and databox dev atm @sat01a
Databox/test Solr in data has been refreshed recently using pipelines 2.11.12.3. You might want to test if this bug still occurs in test.
See https://github.com/AtlasOfLivingAustralia/preingestion/issues/71 for more details about pipelines versions.
Only dr368 in the test environment contains countryConservation: Not Listed
. Unsure why this is the case as it was processed and loaded after the species list dr656 was updated on lists prod and test. Investigating.
A comparison between test and production:
test | prod | |
---|---|---|
data resources with any countryConservation value | 213 | 1 (dr368) |
taxonConceptID with any countryConservation value | 9177 | 11948 |
taxonConceptID with any countryConservation value other than "Not Listed" | 1930 | 514 |
Do we have a raw_ and processed value? I guess if Bionet (dr368) or any data resource is passing us that value I'd prefer that it only contained values from the EPBC list (ie dr656)
Putting this on hold until after production pipelines is upgraded since it may already be fixed.
Status and counts for facet countryConservation . This is populated from the status value in the species lists. e.g. |
Status | Count |
---|---|---|
Not Listed | 6920694 | |
Migratory | 1892737 | |
Vulnerable | 1051252 | |
Endangered | 731931 | |
Critically Endangered | 289599 | |
Priority 4: Rare, Near Threatened | 197764 | |
Priority 1: Poorly-known species | 150067 | |
Priority 2: Poorly-known species | 125983 | |
Priority 3: Poorly-known species | 103300 | |
Conservation Dependent | 98169 | |
Other Specially Protected | 56762 | |
Extinct | 7916 | |
Extinct in the wild | 144 |
raw_countryConservation
appears to be empty. It should be populated from the sourceStatus
value in the species lists.
raw_stateConservation
appears to be correctly fetching the values in the sourceStatus
column. e.g. https://biocache.ala.org.au/ws/occurrences/search?q=species_list_uid:dr653&facet=true&facets=raw_stateConservation&pageSize=0
stateConservation
also appears to be working. https://biocache.ala.org.au/ws/occurrences/search?q=species_list_uid:dr653&facet=true&facets=stateConservation&pageSize=0
I suspect that the Priority*
values in the countryConservation
field are from https://lists.ala.org.au/speciesListItem/list/dr2201 which was last updated 14/6. It now contains the region Western Australia
and I hope that the pipeline was run an a previous version of the list.
It may be required to update status
field of conservation lists. status
should contain a processed value.
I can confirm that raw_countryConservation
is not populated by the pipeline. Is this required?
The conservation lists used to have a status
and a sourceStatus
value because we attempted to map values onto a vocabulary approximating the IUCN Red List, so retained the mapping in these fields. This is no longer required because most states have now worked that into their lists themselves. The field sourceStatus
might still be in the lists but I think we should remove the field State conservation (unprocessed)
from the facets.
Note that these are not Darwin Core terms, they are in ALA namespace.
Having said all this, BioNet provide us with stateConservation and countryConservation in their data, and provide this 'Not Listed' value, which is true for NSW, but misleading in our data. My suggestion is to retain any provided value as raw_stateConservation and raw_countryConservation, and the values shouldn't appear on the UI.
@adam-collins doh, sorry I just realised the issue around countryConservation. Really the only values in this should be coming from the EPBC list https://lists.ala.org.au/speciesListItem/list/dr656 which I thought was:
@peggynewman dr2201
was matching to Australia
because it had 2 spaces between Western
and Australia
. Fixed on test and prod. Future matches will correctly match to the state instead of the country.
Moved the issue to https://github.com/gbif/pipelines/issues/920
Holy cow, good catch @adam-collins ping @rosemaryjoconnor
@adam-collins oh wow excellent catch Adam. Thankyou
@adam-collins this is still a problem in production - ie WA values are still appearing in the countryConservation field. A full ingest ran last night and I would have expected that would have resolved the problem. What do you think?
I want to close this issue and reopen another.
@adam-collins See here, a query on the Qld Conservation Statuses list (species_list_uid:dr652) is picking up a raw_conservationStatus of Representative record
. Should I raise this in pipelines?
Also weird: Near threatened values aren't getting through to biocache from the list at all.
An update to the requirements of this issue.
Pull request https://github.com/gbif/pipelines/pull/929
Merged into 2.18.0-SNAPSHOT
raw_stateConservation is now gone - I suggest we remove it from the facet list. stateConservation and countryConservation are looking great.
Removed raw_stateConservation in pull request https://github.com/AtlasOfLivingAustralia/ala-install/pull/774
Testing can be done by having a data resource with the default values for stateConservation and contryConservation. After reingesting SOLR should have these values in the raw_stateConservation and raw_countryConservation fields. Currently there are none in test. Or it is not working.
All good. Please removed raw_stateConservation from the facet list.
I would have thought that these were the lists that informed the population of the stateConservation and countryConservation fields (via pipelines indexing). I'm told we had the metadata wrong and that the lists need to be flagged isAuthoritative and isThreatened and would appreciate confirmation. Does the region need to be set as well?
Currently the values in countryConservation (for all occurrences) are:
whereas the distinct values in the EPBC list are:
We're about to go to a conference and do a talk on how we index threatened species, so it would be good if we could get this working.