AtlasOfLivingAustralia / DataQuality

Data Quality
0 stars 0 forks source link

query for excluded records for record type return no results #262

Open M-Nicholls opened 2 years ago

M-Nicholls commented 2 years ago

When a search includes records that are Material Samples, which are excluded by default, Amanita roseolamellata for example: https://biocache.ala.org.au/occurrences/search?q=lsid:966799e0-b36c-445f-9e46-69a7085ed00d If you click on the "744 records excluded" link to show those, that search finds no records: https://biocache.ala.org.au/occurrence/search?q=lsid%3A966799e0-b36c-445f-9e46-69a7085[…]ord%3AMATERIAL_SAMPLE+%2BcontentTypes%3AEnvironmentalDNA%29 It does not seem to affect searches for any other type of excluded record, including where some are excluded due to record type but are not of type Material Sample, eg. these 5 Osphranter rufus records are excluded because they are Fossil Specimens: https://biocache.ala.org.au/occurrence/search?q=lsid%3Aurn%3Alsid%3Abiodiversity.org.a[…]ord%3AMATERIAL_SAMPLE+%2BcontentTypes%3AEnvironmentalDNA%29

Other species that the user has identified have this problem are: Hypholoma fasciculare, Lentinus arcularius, Amanita xanthocephala

there's something odd going on with bracketing in the show excluded records e.g. https://biocache.ala.org.au/occurrence/search?q=lsid%3A966799e0-b36c-445f-9e46-69a7085[…]basisOfRecord:MATERIAL_SAMPLE+contentTypes:EnvironmentalDNA works fine but the query used in the interface: https://biocache.ala.org.au/occurrence/search?q=lsid%3A966799e0-b36c-445f-9e46-69a7085[…]ord%3AMATERIAL_SAMPLE+%2BcontentTypes%3AEnvironmentalDNA%29 has brackets around the second parameters +(%2BbasisOfRecord%3AMATERIAL_SAMPLE+%2BcontentTypes%3AEnvironmentalDNA) it's hard to see due to the conversion to the character codes.

timhicks-ala commented 2 years ago

Initially reported in https://support.ehelp.edu.au/a/tickets/101299

alexhuang091 commented 2 years ago

Ran a quick test, the filters are fq=-basisOfRecord:"FOSSIL_SPECIMEN" AND -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes:"EnvironmentalDNA")

we inverse it and run a query to get all the excluded records, the query is https://biocache.ala.org.au/occurrence/search?q=lsid:966799e0-b36c-445f-9e46-69a7085ed00d&qualityProfile=ALA&disableAllQualityFilters=true&fq=basisOfRecord:"FOSSIL_SPECIMEN"+(+basisOfRecord:MATERIAL_SAMPLE++contentTypes:EnvironmentalDNA) which is actually incorrect I think.

alexhuang091 commented 2 years ago

I see, the record-type fq=-basisOfRecord:"FOSSIL_SPECIMEN" AND -(basisOfRecord:"MATERIAL_SAMPLE" AND contentTypes:"EnvironmentalDNA") was constructed with latest dq-service, which allows user to input arbitrary filters, see the -( AND )

Backend should have problem in inversing it.

alexhuang091 commented 2 years ago

I ran biocache-service locally with ssh connected to prod solr. When I ran http://localhost:8078/occurrences/search?q=lsid:966799e0-b36c-445f-9e46-69a7085ed00d&qualityProfile=ALA&disableAllQualityFilters=true&fq=basisOfRecord:"FOSSIL_SPECIMEN" (+basisOfRecord:"MATERIAL_SAMPLE" +contentTypes:"EnvironmentalDNA") result is correct (744 records) returned. 1

alexhuang091 commented 2 years ago

The excluded records url generated by ala-hub is http://dev.ala.org.au:8081/ala-hub/occurrence/search?q=lsid:966799e0-b36c-445f-9e46-69a7085ed00d&qualityProfile=ALA&disableAllQualityFilters=true&fq=basisOfRecord:"FOSSIL_SPECIMEN"+(+basisOfRecord:MATERIAL_SAMPLE++contentTypes:EnvironmentalDNA) which returns 0 record.

notice there is no "" around MATERIAL_SAMPLE and EnvironmentalDNA

adding back the "" making the url to http://dev.ala.org.au:8081/ala-hub/occurrence/search?q=lsid:966799e0-b36c-445f-9e46-69a7085ed00d&qualityProfile=ALA&disableAllQualityFilters=true&fq=basisOfRecord:"FOSSIL_SPECIMEN"+(+basisOfRecord:"MATERIAL_SAMPLE"++contentTypes:"EnvironmentalDNA") it returns 744 records that just want we want

alexhuang091 commented 2 years ago

call https://data-quality-service.ala.org.au/api/v1/quality/getAllInverseCategoryFiltersForProfile?qualityProfileId=92 you can see

"record-type": "basisOfRecord:\"FOSSIL_SPECIMEN\" (+basisOfRecord:MATERIAL_SAMPLE +contentTypes:EnvironmentalDNA)",