AtlasOfLivingAustralia / la-pipelines

Living Atlas Pipelines extensions
3 stars 4 forks source link

Data Quality profile filters have missing Links #419

Closed javier-molina closed 3 years ago

javier-molina commented 3 years ago

This looks very close to prod. Only aspect that is different, is the test site is missing the link to the GH wiki page for detailed info on the assertion details.

image

Maybe this can be dealt with a separate issue for wiki content (might already exists?) for a subsequent release?

@javier-molina feel free to move to done, if above is OK with you.

Originally posted by @nickdos in https://github.com/AtlasOfLivingAustralia/la-pipelines/issues/392#issuecomment-857299751

javier-molina commented 3 years ago

From @RobinaSanderson:

Hi @nickdos @javier-molina @brucehyslop

I think the issue Nick has found above is that the assertions without links are either new or don't have an equivalent in the pipelines. Also the filters mapping to the wiki pages appear broken. Looking at the ALA General profile example below I've looked at each filter without a link. image

I think the link mapping is held in the messages properties file in Github as per https://github.com/AtlasOfLivingAustralia/DataQuality/issues/116 I'm sorry but I'm not sure how to update this.

The new spatiallyValid filter description could point to the new KB article on spatial validity when it's published.

The following filters are based on assertions using camel case so I assume they do not map to anything in pipelines and will be removed:

  • nameNotRecognised
  • homonymIssue
  • occCultivatedEscapee
  • zeroLatitude
  • zeroLongitude

There are also the following filters which could link to pages duplicateStatus:"ASSOCIATED" - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_status coordinateUncertaintyInMeters:[10001 TO ] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/coordinate_uncertainty userAssertions:50001 - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/user_assertions userAssertions:50005 - same as above outlierLayerCount:[3 TO ] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/outlier_layer_count occurrenceStatus:ABSENT - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_status year:[* TO 1700] - https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/year

Not sure if basisOfRecord is still being used? If so, these filters map to https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/basis_of_record

Maybe this should be moved to it's own issue and I can go through the other profiles?

javier-molina commented 3 years ago

From @alexhuang091:

the wiki links should come from messages.properties file and seems Bruce has updated them

wiki.assertions=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/assertions
wiki.basisOfRecord=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/basis_of_record
wiki.coordinateUncertaintyInMeters=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/coordinate_uncertainty
wiki.decade=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_decade_i
wiki.duplicateStatus=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_status
wiki.isDuplicateOf=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicate_record
wiki.license=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/license
wiki.occurrenceStatus=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/occurrence_status
wiki.outlierLayerCount=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/outlier_layer_count
wiki.spatiallyValid=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/geospatial_kosher
wiki.taxonomicIssues=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/taxonomic_issue
wiki.userAssertions=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/user_assertions
wiki.year=https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/year

on develop branch https://github.com/AtlasOfLivingAustralia/biocache-service/blob/cf41784f1f57e39f7af7232ebd360ee966c34f6d/src/main/java/au/org/ala/biocache/dao/SearchDAOImpl.java#L3669

infoUrl is set and returned when user calls ws/index/fields?fl=occurrenceStatus to get the details of a field.

But this function seems to be removed in pipeline code so https://biocache-test.ala.org.au/ws/index/fields?fl=occurrenceStatus doesn't return a infoUrl field.

hubs uses infoUrl as the target of link. No infoUrl then there's no link.

I'll keep an eye on it and when I finish the index issue I can work on this.

I mean the link comes from infoUrl in ws/index/fields?fl=occurrenceStatus, but in pipeline this field is not returned.

alexhuang091 commented 3 years ago

It will be fixed with https://github.com/AtlasOfLivingAustralia/biocache-service/pull/632. Need to test after it's deployed.

Seems the code in pipeline was copied from an old version in develop and subsequent changes on develop is not taken into pipeline. There seems to have more changes need to be merged into pipeline. I'll check it later.

截屏2021-06-11 上午9 09 34
alexhuang091 commented 3 years ago

Some links back but still some issue.

nameNotRecognized link is incorrect.

1

RobinaSanderson commented 3 years ago

Hi @alexhuang091 - I think nameNotRecognised is no longer supported instead we'll need to change the filter to use TAXON_MATCH_NONE - I'm going through the filters to see what I think needs changing, I should finish by lunchtime Tuesday.

brucehyslop commented 3 years ago

Changed nameNotRecognised to TAXON_MATCH_NONE and homonymIssue to TAXON_HOMONYM and created new wiki pages for these assertions values.

brucehyslop commented 3 years ago

the assertions codes to assertion names from the Data Quality Checks google sheet are used to create the links to the wiki page. These are likely get out of sync is there are changes to the assertions See #426

brucehyslop commented 3 years ago

@RobinaSanderson, @M-Nicholls

There is no assertion occCultivatedEscapee in pipelines. Should this filter be removed, or is there another filter that can be applied to perform the same check?

RobinaSanderson commented 3 years ago

Hi @brucehyslop and @M-Nicholls I think we'd get a similar result if we filtered out Establishment Means = Managed. Miles do you have a preference for whether Introduced should also be filtered out?

javier-molina commented 3 years ago

I think we should capture the discussion about changed filters in #420

brucehyslop commented 3 years ago

The message properties have been updated on the test environment.

Some links are still missing however for the profiles:

spatial quality issues

These link aren't displaying due to a bug with retrieving the index fields #481

RobinaSanderson commented 3 years ago

I'm still finding some unexpected results in the links.

Tested by:

  1. search for all records. The ALA General data profile is applied. https://biocache-dq-test.ala.org.au/occurrences/search?taxa=
  2. Click on the information icon next to the ALA General profile name. The metadata for the general profile opens as expected.
  3. Click on the link for -assertions:TAXON_MATCH_NONE - this opens the wiki page for assertions in general (https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/assertions) not the page for the TAXON_MATCH_NONE assertion (https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/TAXON_MATCH_NONE)

All the queries based on assertions are linked to the general assertion wiki page, except for -assertions:"PRESUMED_SWAPPED_COORDINATE" which tries to link to PRESUMED_SWAPPED_COORDINATES (note the extra S) and tries to create a new wiki page.

No links for establishmentMeans and decimalLongitude are documented separately, but the following also have no links displayed:

Also in the metadata for the "Exclude records with additional spatial quality issues" the descriptions are a bit scrambled image

Also the metadata for "Exclude records based on record type" needs to be updated. The current filter value -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes should read -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes:"EnvironmentalDNA"

RobinaSanderson commented 3 years ago

This is probably a separate issue but the description on the metadata for "Exclude duplicate records" currently reads: Identifies suspected duplicates (D), and what appears to be the best representation of suspected duplicate records (R)

It should now read something like: Identifies suspected duplicates (ASSOCIATED), and what appears to be the best representation of suspected duplicate records (REPRESENTATIVE)

Exclude all records where the duplicate status is "Associated"

javier-molina commented 3 years ago

@brucehyslop it seems that this and #445 need some more TLC.

brucehyslop commented 3 years ago

See https://github.com/AtlasOfLivingAustralia/la-pipelines/issues/445#issuecomment-901622642

The missing links for userAssertions and outlierLayerCount only occur in test because there are no (or very few) records that have these fields populated.

The links for these field do work in prod.

nickdos commented 3 years ago

Could someone who has already previously tested this issue, give it one last pass so we can move it to done?

RobinaSanderson commented 3 years ago

Hi @nickdos and @brucehyslop

The links are working as expected.

There is a separate issue in that the duplicate filter has been changed to use the duplicateType field. This does not have a wiki entry and therefore no links can be made. Do you want me to raise a separate issue for this? If so, should it be in la-pipelines or in biocache-hub?

nickdos commented 3 years ago

Duplicate DQ filters are a work-in-progress still. But it would be good to have a page for that field as I think it might stay in there.

RobinaSanderson commented 3 years ago

Hi @nickdos - I've created a wiki page for duplicateType: https://github.com/AtlasOfLivingAustralia/ala-dataquality/wiki/duplicateType

I've given it a description based on the values that populate the field and what I think it means, could you please take a look and check what I have written is correct?

Based on this workflow: https://confluence.csiro.au/display/ALASD/Field+and+assertions+metadata+architecture+and+editing the messages properties file needs to be updated for the description to be available in the biocache interface and for the link to wiki to be available for the data quality filters. I haven't done this before, would it be possible for a dev to do?

nickdos commented 3 years ago

Thanks @RobinaSanderson - looks good to me. @peggynewman would you mind also taking a look at the page Robina created for duplicateTyle?

I'm not sure about the terminology we want to stick WRT duplicate vs associated record?