NERC-CEH / npms-app

📱NPMS mobile application
http://www.npms.org.uk/
GNU General Public License v3.0
0 stars 0 forks source link

Suggest photo when ID difficulty = or > 2 #54

Closed sacrevert closed 3 years ago

sacrevert commented 4 years ago

Display the "please add a photo" alert for all taxa, across all survey levels, when the iRecord "ID difficulty" flag is at least level 2 (e.g. most grasses, like Poa annua).

Note that a photo is not mandatory, just encouraged

kazlauskis commented 3 years ago

@sacrevert is this is based on the species informal group type? In the app we have these types I can pick: fern flower. plant conifer stonewort clubmoss horsetail ginkgo quillwort

sacrevert commented 3 years ago

@kazlauskis No, this is based on a specific numeric ID difficulty field, this is already used for some of the NPMS record autochecks (and elsewhere in indicia), however, I don't know where it is stored. I expect one of @DavidRoy @andrewvanbreda and/or @kitenetter will know though

kitenetter commented 3 years ago

@kazlauskis this is the query I use to list the ID difficulty scores for plants: SELECT vrd.id, tg.title, t.taxon, vrd.key, vrd.value, vr.title, vr.description, vr.test_type FROM verification_rule_data vrd JOIN verification_rules vr ON (vrd.verification_rule_id = vr.id AND vr.deleted IS NOT true) JOIN taxa t ON (vrd.key = t.search_code AND t.deleted IS NOT true AND t.taxon_group_id IN (78,81,87,89,94,99,137,148)) JOIN taxon_groups tg ON t.taxon_group_id = tg.id JOIN taxa_taxon_lists ttl ON (t.id = ttl.taxon_id AND ttl.deleted IS NOT true AND ttl.taxon_list_id = 15 AND ttl.preferred = true) WHERE vrd.deleted IS NOT true

Note that a very few taxa have more than one ID difficulty score assigned.

sacrevert commented 3 years ago

Thanks @kitenetter !

kazlauskis commented 3 years ago

Thanks, unfortunately, I don't have the access to the database to run the query. Would it be possible for someone to add the scores next to species names if I provided an excel sheet?

sacrevert commented 3 years ago

Query results: data-1617800813873.zip

kitenetter commented 3 years ago

@sacrevert I don't understand why some species have multiple entries, with different scores for the same species. This may not matter for this context, where you are only interested in whether plants are above or below the threshold of 2.

But it may be that some rules exist in more than one version, and if you need to know about that we will need to investigate further.

andrewvanbreda commented 3 years ago

@sacrevert Hi Oli, Just to note, as far as I am aware the main website has never had a requirement like this.

sacrevert commented 3 years ago

@andrewvanbreda not for photos, no. The scores are used in the auto-verification rulesets though. I'll check the NPMS github, if it's not there i will add the photo check thing so that, eventually, it is the same between app and website.

sacrevert commented 3 years ago

@kitenetter Sorry, don't know. @DavidRoy do you know where the actual code is that implements the NPMS verification rulesets?

DavidRoy commented 3 years ago

@sacrevert I'm not where the code exists but @andrewvanbreda built the auto-verification functionality I believe

kazlauskis commented 3 years ago

@sacrevert thanks for providing the species difficulty data. In the app we currently only have species names and their warehouse IDs but using these alone might be error-prone if I tried to link with the difficulty data sheet. It would be much better if I had the species TVK keys part of the species data in the app - would you be able to also run this query for me please? (this is an old query and I hope this will work)

I am not sure where in this query we should add the TVK keys but that might be simply adding a t.search_code to the SELECT statement.

-- CHANGE taxon_list_id to match the list

-- Get all english taxon names (taxa_taxon_list + taxa)
WITH joined AS (
    SELECT * FROM indicia.taxa_taxon_lists as ttl 
    JOIN indicia.taxa as ta
    ON ta.id = ttl.taxon_id and ta.language_id = 1 
    WHERE ttl."taxon_list_id" = 167 and ta.deleted = false
),
-- Get only unique taxon groups
unique_taxon_groups AS (
    SELECT (array_agg(id))[1] as id, title 
    FROM indicia.taxon_groups
    WHERE deleted = false
    GROUP BY title
)

-- START  - !!add a TVK key here somewhere!!
SELECT preferred_taxa_taxon_list_id as id, tg.id as taxon_group, preferred_taxon as taxon, default_common_name as common_name, (array_agg(joined.taxon))[1] as synonym

-- get preferred_taxon and default_common_name from cacheed taxon terms
FROM indicia.cache_taxon_searchterms as t 

-- we need to add synonyms
LEFT JOIN joined ON t.taxon_meaning_id = joined.taxon_meaning_id and default_common_name != joined.taxon -- get synonyms that match taxon_meaning_id
LEFT JOIN unique_taxon_groups as tg on t.taxon_group = tg.title
WHERE t."taxon_list_id" = 167 --wildflower=167 inventory=168 indicator=15  Test warehouse: 34, 158, 156
AND (t."taxon_group" IN ('clubmoss','conifer','fern','horsetail','flowering plant','ginkgo','quillwort','stonewort')) 
AND t.simplified = false
AND t.preferred = true 
AND t.name_type = 'L' 

-- there are multiple synonyms and other stuff for each latin taxa so we need to aggregate
GROUP BY t.id, tg.id, preferred_taxon, default_common_name 
ORDER by "preferred_taxon" ASC 
andrewvanbreda commented 3 years ago

@DavidRoy @sacrevert @kitenetter Just in reply to David's comment. Yes you are right, I did do the code for this a while ago. Essentially the auto verifier sets the verification status based on the state of the occurrence (so checks its identification difficulty, record_status etc), this logic is held in a some SQL which can be found in this file (see around line 120)

https://github.com/Indicia-Team/warehouse/blob/master/modules/auto_verify/plugins/auto_verify.php

I think it is the "Data Cleaner" that actually uses something called a ruleset (although don't quote me on this last bit as not certain, as never worked much on that area)

sacrevert commented 3 years ago

@kazlauskis @kitenetter @DavidRoy I presume this is right. Maybe someone more knowledgeable than me on this topic can confirm. Attachment below.

-- CHANGE taxon_list_id to match the list

-- Get all english taxon names (taxa_taxon_list + taxa)
WITH joined AS (
    SELECT * FROM indicia.taxa_taxon_lists as ttl 
    JOIN indicia.taxa as ta
    ON ta.id = ttl.taxon_id and ta.language_id = 1 
    WHERE ttl."taxon_list_id" = 167 and ta.deleted = false
),
-- Get only unique taxon groups
unique_taxon_groups AS (
    SELECT (array_agg(id))[1] as id, title 
    FROM indicia.taxon_groups
    WHERE deleted = false
    GROUP BY title
)

-- START  - !!add a TVK key here somewhere!!
--SELECT * FROM indicia.cache_taxon_searchterms as t 
SELECT preferred_taxa_taxon_list_id as id, tg.id as taxon_group, preferred_taxon as taxon, default_common_name as common_name, t.external_key as TVK, (array_agg(joined.taxon))[1] as synonym

-- get preferred_taxon and default_common_name from cacheed taxon terms
FROM indicia.cache_taxon_searchterms as t 

-- we need to add synonyms
LEFT JOIN joined ON t.taxon_meaning_id = joined.taxon_meaning_id and default_common_name != joined.taxon -- get synonyms that match taxon_meaning_id
LEFT JOIN unique_taxon_groups as tg on t.taxon_group = tg.title
WHERE t."taxon_list_id" = 167 --wildflower=167 inventory=168 indicator=15  Test warehouse: 34, 158, 156
AND (t."taxon_group" IN ('clubmoss','conifer','fern','horsetail','flowering plant','ginkgo','quillwort','stonewort')) 
AND t.simplified = false
AND t.preferred = true 
AND t.name_type = 'L' 

-- there are multiple synonyms and other stuff for each latin taxa so we need to aggregate
GROUP BY t.id, tg.id, preferred_taxon, t.external_key, default_common_name 
ORDER by "preferred_taxon" ASC

data-1617810811711.zip

sacrevert commented 3 years ago

And for t."taxon_list_id" = 168 data-1617811131247.zip And for t."taxon_list_id" = 15 data-1617811185398.zip