AMI-system / species_classifier

This repository contains the code to create on-device machine learning models for species classification.
MIT License
2 stars 0 forks source link

Provide list of UK moths for building regional classifier #4

Closed DavidRoy closed 1 year ago

DavidRoy commented 1 year ago

Citation: https://www.gbif.org/dataset/dbaa27eb-29e7-4cbb-8eab-3f689cfce116

File for all moths uksi-moths.csv

DavidRoy commented 1 year ago

@KatrionaGoldmann please this issue if this gives you what you need for building a UK classifier for testing the edge model. Also, should the data file go somewhere else in this repo?

DavidRoy commented 1 year ago

this is a shorter list of just the macro-moths that are typically larger and likely to have more images on GBIF uksi-macro-moths.csv

KatrionaGoldmann commented 1 year ago

Thanks David! Looks good to me, but I will explore further and get back to you if there are issues.

KatrionaGoldmann commented 1 year ago

Errors for:

family_taxon taxon preferred_authority common_name name_tvk rtvk organism_key
Noctuidae Orthosia cruda ([Denis & Schiffermüller], 1775) Small Quaker NHMSYS0021144790 NHMSYS0021144790 NBNORG0000058639
Geometridae Macaria notata (Linnaeus, 1758) Peacock Moth NBNSYS0100003857 NBNSYS0100003857 NBNORG0000042369
Geometridae Xanthorhoe ferrugata (Clerck, 1759) Dark-barred Twin-spot Carpet NBNSYS0000005800 NBNSYS0000005800 NBNORG0000009107
Noctuidae Agrotis spinifera (Hobner, 1808) Gregson's Dart NHMSYS0000501183 NHMSYS0000501183 NBNORG0000057417

I think this is because some sub-species are creeping through. Will explore the GBIF API call further

KatrionaGoldmann commented 1 year ago

Even with data = species_api.name_backbone(name=name, strict=True, rank="SPECIES") some 'rank': 'GENUS results are returned from the GBIF API. For now I have added a catch to skip these entries, but it would be good to know why this is happening and make sure we are not skipping certain species from our data. I will open this as a new issue.