BU-Spark / ml-herbarium

Herbaria ML
15 stars 12 forks source link

ML-Herbarium: Feature - label transcription dictionaries & code clean up #53

Closed eamonniknafs closed 2 years ago

eamonniknafs commented 2 years ago

geography acc: 1/120 = 0.8333333333333334% geography no match: 45/120 = 37.5% geography wrong: 74/120 = 61.66666666666667%


The accuracy is not great. This could be for a number of reasons, listed in order of likelihood:
1. The corpus and ground truth files do not have text that accurately matches the labels
2. The matching algorithm needs fine-tuned or replaced
3. We should use segmentation to de-noise
4. Our OCR model needs to be retrained or replaced