AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Taxon not matched when accent character present in scientific name #203

Closed nickdos closed 6 years ago

nickdos commented 7 years ago

Some SAM records are only being matched to genus despite the records providing a valid scientificName as well valid genus and specificEpithet fields. E.g. this record is correctly matched:

http://biocache.ala.org.au/occurrences/97ceb83a-b2a2-4cb9-914b-369d76eac47d

but this one is not:

http://biocache.ala.org.au/occurrences/6f15ef99-9f1c-47fa-aeed-fb311780b490

The 1st record was provided with genus and specificEpithet but no scientificName. The second record was provided with all 3 but the scientificName field contains an unrecognised character:

Litoria ewingii (Dum�ril & Bibron, 1841)​

Fix is to be tolerant of the non-UTF-8 char (ignore it maybe) and to check for genus and specificEpithet in order to confirm the match.

djtfmartin commented 6 years ago

looks to be fixed