aristus / accent-folding

Example code for the Accent Folding article on ALA
61 stars 24 forks source link

Duplicate key/value pairs #3

Open patbenatar opened 12 years ago

patbenatar commented 12 years ago

There appears to be a grip of duplicate key/value pairs in this hash.. For example Ł is in there as a key twice. Some of the duplicates have the same key with different values. I noticed this cause ExecJS complains about duplicate keys in hashes when it compiles Coffee to JS (I ported the hash to Coffee real quick).

aristus commented 12 years ago

Good catch! I have a patch for both JavelinJS and this branch, in my mythical spare time.

patbenatar commented 12 years ago

I was working on a fixed hash but then ran into the duplicates that have different values and wasn't sure what to do with them.

aristus commented 12 years ago

Hmm. I won't have time to finish my patch for a while. Maybe send / post the dupes? We can probably work out which values are "right".

patbenatar commented 12 years ago

Here's what I found:

dupe keys (removed):

    'Ł':'l',
    'ł':'l',
    '̣':'l',
    '̈':'n',
    '̃':'p',
    '̩':'s',

dupe keys with different values:

    '̈':'t', # removed this one for now
    '̈':'n',

    '̊':'w',
    '̊':'y', # removed this one for now

Here's the updated hash with above changes: https://gist.github.com/2772409

patbenatar commented 12 years ago

Also noticed some missing ones that are relatively common:

Ææ => should be replaced with "ae" Ðð => should be replaced with "th", even though the shape is more of a "d" Œœ => should be replaced with "oe"

And some inaccurate ones: Þþ => should be replaced with "th" ß => should be replaced with either "ss" or "sz"

I'll go ahead and add the missing ones. What do you think about the inaccurate ones?

patbenatar commented 12 years ago

Here's an updated hash with the missing ones added and inaccurate ones fixed: https://gist.github.com/2772435