Open msaby opened 4 years ago
Isn't that already available as a fingerprint function? If not it could potentially be added as such since it is possible to call clustering functions from GREL.
I was thinking of something less agressive than fingerprint : "L'école et les ecoles" -> "L'ecole et les ecoles"
This seems to be fairly easy enough to do now if we simply use Apache StringUtils stripAccents
I suggest for labeling simplicity (translations) to call the new GREL function the same, stripAccents()
.
I'd like to see a more general approach to text normalization than just removing diacritics. We also need to deal with normalizing the various composed vs decomposed forms. Other related issues include #409 and #650.
I'm removing the "good second issue" label until we have the design nailed down. One possible approach would be to create a normalize
function with different "strengths" of normalization to apply (decomposition, diacritic removal, case folding, etc).
@tfmorris Sounds good Tom. I would always trust you for expertise with localization and international support anyways :-)
Is your feature request related to a problem or area of OpenRefine? Please describe.
It could be useful to have a a menu and a GREL function to remove diacritics in strings.
Ex :
"école" -> "ecole"