desperado1992 / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Enhancement FingerprintKeyer - normalize German ß to ss #409

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Could you please extend the FingerprintKeyer with a normalization for the 
German Eszett "ß" to "ss". This would help in common cases like Strasse = 
Straße.

The code would then look like

case '\u00DF':
  return 'ss';

Original issue reported on code.google.com by bun...@gmx.net on 14 Jun 2011 at 8:43

GoogleCodeExporter commented 8 years ago
I'm not sure we can use them directly, but what we probably want to do is 
something equivalent to the Java CollationKeys 
(http://download.oracle.com/javase/1.4.2/docs/api/java/text/CollationKey.html) 
that we use for sorting.  

By allowing the user to specify the "strength" we could let them control 
whether letter case, accents, or language specific things like the eszett are 
normalized.

This would be a general solution for all languages rather than something 
specific to a single character or language.

Original comment by tfmorris on 14 Jun 2011 at 4:58