Closed javier-molina closed 3 years ago
From @RobinaSanderson:
Hi @nickdos @javier-molina @brucehyslop
I think the issue Nick has found above is that the assertions without links are either new or don't have an equivalent in the pipelines. Also the filters mapping to the wiki pages appear broken. Looking at the ALA General profile example below I've looked at each filter without a link.
I think the link mapping is held in the messages properties file in Github as per https://github.com/AtlasOfLivingAustralia/DataQuality/issues/116 I'm sorry but I'm not sure how to update this.
The new spatiallyValid filter description could point to the new KB article on spatial validity when it's published.
The following filters are based on assertions using camel case so I assume they do not map to anything in pipelines and will be removed:
- nameNotRecognised
- homonymIssue
- occCultivatedEscapee
- zeroLatitude
- zeroLongitude
There are also the following filters which could link to pages duplicateStatus:"ASSOCIATED" - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_status coordinateUncertaintyInMeters:[10001 TO ] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/coordinate_uncertainty userAssertions:50001 - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/user_assertions userAssertions:50005 - same as above outlierLayerCount:[3 TO ] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/outlier_layer_count occurrenceStatus:ABSENT - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_status year:[* TO 1700] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/year
Not sure if basisOfRecord is still being used? If so, these filters map to https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/basis_of_record
Maybe this should be moved to it's own issue and I can go through the other profiles?
From @alexhuang091:
the wiki links should come from messages.properties file and seems Bruce has updated them
wiki.assertions=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/assertions wiki.basisOfRecord=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/basis_of_record wiki.coordinateUncertaintyInMeters=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/coordinate_uncertainty wiki.decade=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_decade_i wiki.duplicateStatus=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_status wiki.isDuplicateOf=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_record wiki.license=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/license wiki.occurrenceStatus=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_status wiki.outlierLayerCount=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/outlier_layer_count wiki.spatiallyValid=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/geospatial_kosher wiki.taxonomicIssues=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/taxonomic_issue wiki.userAssertions=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/user_assertions wiki.year=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/year
infoUrl is set and returned when user calls
ws/index/fields?fl=occurrenceStatus
to get the details of a field.But this function seems to be removed in pipeline code so
https://biocache-test.ala.org.au/ws/index/fields?fl=occurrenceStatus
doesn't return a infoUrl field.hubs uses infoUrl as the target of link. No infoUrl then there's no link.
I'll keep an eye on it and when I finish the index issue I can work on this.
I mean the link comes from infoUrl in ws/index/fields?fl=occurrenceStatus, but in pipeline this field is not returned.
It will be fixed with https://github.com/AtlasOfLivingAustralia/biocache-service/pull/632. Need to test after it's deployed.
Seems the code in pipeline was copied from an old version in develop and subsequent changes on develop is not taken into pipeline. There seems to have more changes need to be merged into pipeline. I'll check it later.
Some links back but still some issue.
nameNotRecognized link is incorrect.
Hi @alexhuang091 - I think nameNotRecognised is no longer supported instead we'll need to change the filter to use TAXON_MATCH_NONE - I'm going through the filters to see what I think needs changing, I should finish by lunchtime Tuesday.
Changed nameNotRecognised
to TAXON_MATCH_NONE
and homonymIssue
to TAXON_HOMONYM
and created new wiki pages for these assertions values.
the assertions codes to assertion names from the Data Quality Checks google sheet are used to create the links to the wiki page. These are likely get out of sync is there are changes to the assertions See #426
@RobinaSanderson, @M-Nicholls
There is no assertion occCultivatedEscapee
in pipelines.
Should this filter be removed, or is there another filter that can be applied to perform the same check?
Hi @brucehyslop and @M-Nicholls I think we'd get a similar result if we filtered out Establishment Means = Managed. Miles do you have a preference for whether Introduced should also be filtered out?
I think we should capture the discussion about changed filters in #420
The message properties have been updated on the test environment.
Some links are still missing however for the profiles:
spatial quality issues
These link aren't displaying due to a bug with retrieving the index fields #481
I'm still finding some unexpected results in the links.
Tested by:
All the queries based on assertions are linked to the general assertion wiki page, except for -assertions:"PRESUMED_SWAPPED_COORDINATE" which tries to link to PRESUMED_SWAPPED_COORDINATES (note the extra S) and tries to create a new wiki page.
No links for establishmentMeans and decimalLongitude are documented separately, but the following also have no links displayed:
Also in the metadata for the "Exclude records with additional spatial quality issues" the descriptions are a bit scrambled
Also the metadata for "Exclude records based on record type" needs to be updated. The current filter value -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes should read -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes:"EnvironmentalDNA"
This is probably a separate issue but the description on the metadata for "Exclude duplicate records" currently reads: Identifies suspected duplicates (D), and what appears to be the best representation of suspected duplicate records (R)
It should now read something like: Identifies suspected duplicates (ASSOCIATED), and what appears to be the best representation of suspected duplicate records (REPRESENTATIVE)
Exclude all records where the duplicate status is "Associated"
@brucehyslop it seems that this and #445 need some more TLC.
See https://github.com/AtlasOfLivingAustralia/la-pipelines/issues/445#issuecomment-901622642
The missing links for userAssertions
and outlierLayerCount
only occur in test because there are no (or very few) records that have these fields populated.
The links for these field do work in prod.
Could someone who has already previously tested this issue, give it one last pass so we can move it to done?
Hi @nickdos and @brucehyslop
The links are working as expected.
There is a separate issue in that the duplicate filter has been changed to use the duplicateType field. This does not have a wiki entry and therefore no links can be made. Do you want me to raise a separate issue for this? If so, should it be in la-pipelines or in biocache-hub?
Duplicate DQ filters are a work-in-progress still. But it would be good to have a page for that field as I think it might stay in there.
Hi @nickdos - I've created a wiki page for duplicateType: https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicateType
I've given it a description based on the values that populate the field and what I think it means, could you please take a look and check what I have written is correct?
Based on this workflow: https://confluence.csiro.au/display/ALASD/Field+and+assertions+metadata+architecture+and+editing the messages properties file needs to be updated for the description to be available in the biocache interface and for the link to wiki to be available for the data quality filters. I haven't done this before, would it be possible for a dev to do?
This looks very close to prod. Only aspect that is different, is the test site is missing the link to the GH wiki page for detailed info on the assertion details.
Maybe this can be dealt with a separate issue for wiki content (might already exists?) for a subsequent release?
@javier-molina feel free to move to done, if above is OK with you.Originally posted by @nickdos in https://github.com/AtlasOfLivingAustralia/la-pipelines/issues/392#issuecomment-857299751