NERC-CEH / irecord-butterflies-app

Repository for the code for the iRecord Butterflies App
0 stars 0 forks source link

Image classifier not assigning warehouse ids. #160

Closed Vilius-Stankaitis closed 2 years ago

Vilius-Stankaitis commented 2 years ago

Some image-classified species do not have assigned warehouse ids even if they have a high probability score.

Chalk Hill Blue

        "classifier_id": "728",
        "classifier_name": "Lysandra coridon",
        "probability": 0.9999998699274708,
        "group": 4

Clouded Yellow

        "classifier_id": "726",
        "classifier_name": "Colias crocea",
        "probability": 0.9434420723435081,
        "group": 4

There are a few more species like "Adonis Blue", "Cryptic Wood White".

kazlauskis commented 2 years ago

@JimBacon this is coming from the iRecord classifier proxy module. The module is configured to use butterflies+moths, and the UK Master List has these species. Is this is something you could have a look at?

JimBacon commented 2 years ago

The module is looking to match the scientific name returned by the classifier against a preferred latin name in the UK Master List.

For Chalk Hill Blue, Adonis Blue the classifier returns Lysandra coridon and Lysandra bellargus which are synonyms, not preferred names, on our list, hence no result. I can't think of a reason not to open this up and match synonyms, especially since our response includes the preferred taxon name. I might want to add the preferred taxa_taxon_list_id to the response.

For Clouded Yellow the classifier returns Colias crocea which is not on our list, even as a synonym. In the UK Colias croceus is the accepted name. I don't know what we do about this sort of disagreement.

For Cryptic Wood White, the result may depend on your photograph. I haven't yet had the classifier successfully identify a record of Leptidea juvernica. I've had it suggest the almost identical Wood White, Leptidea sinapis, or it has given a genus level result of Leptidea spec. That is not a name format we recognise so it won't be matched. If we stripped of the 'spec.' we could get a match for the genus but I don't know if that is useful to the app. Some additional intelligence, using the known distribution of the species, would be interesting to consider.

kazlauskis commented 2 years ago

I can't think of a reason not to open this up and match synonyms, especially since our response includes the preferred taxon name.

I agree, the proxy can look for matching synonyms too.

For Clouded Yellow the classifier returns Colias crocea which is not on our list, even as a synonym. In the UK Colias croceus is the accepted name. I don't know what we do about this sort of disagreement.

Adding Colias crocea as a synonym of Colias croceus would then solve the issue. I don't think we need to fix this one right now but having the capability to extend the synonyms list would be good. Maybe the proxy could allow selecting multiple species lists in the config. In this case, we could pick the UK master list and another for extra synonyms.

DavidRoy commented 2 years ago

Agreed, we will need to deal with the synonymy

JimBacon commented 2 years ago

@DavidRoy, is this a recognised synonym? If so, the best resolution would be to add it to the UKSI. That needs someone with sufficient taxonomic understanding to provide the information as described at https://forums.nbn.org.uk/viewtopic.php?id=3805. It would then appear in our list eventually. If it is an error with the classifier, it would benefit everyone if that could be corrected.

JimBacon commented 2 years ago

Just noticed that the European butterfly list contains Colias crocea, https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/432207

JimBacon commented 2 years ago

I have deployed an update to the code so that it will now match the taxon names returned by the classifier against synonyms in the Indicia species list. This fixes the issue on one level but depends upon the synonyms being present in the Indicia species list.

kazlauskis commented 2 years ago

Yep, I can confirm it is matching the synonyms now 🦋 thanks!