Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
107
stars
55
forks
source link
Support unicode characters by replacing unidecode with new normalize method #4484
Back in 2016 we addd the unidecode library to fix unicode issues with the data. That worked well until now, where we need to keep diacritics/accent marks and further support the arabic character set.
What's this PR do?
Remove unidecode
Create new method to normalize set of characters that would prevent reasonable matches (e.g., mdash, fancy quotes, etc).
Use the unicodedata.normalize method to force unicode characters to combine the letter and diacritic together. Using the NFC (Normalization Form Composition) setting which has extended functionality.
How should this be manually tested?
unit tests
import unicode data (new test file forthcoming)
UI testing by inserting unicode characters. A great test is to edit a matching field to insert a unicode character, then import a new dataset with that unicode character in the matching field.
Any background context you want to provide?
Back in 2016 we addd the unidecode library to fix unicode issues with the data. That worked well until now, where we need to keep diacritics/accent marks and further support the arabic character set.
What's this PR do?
How should this be manually tested?
What are the relevant tickets?
4479
Screenshots (if appropriate)