String.prototype.normalize

andrewrk / node-diacritics

remove diacritics from strings ("ascii folding") - Node.js module

MIT License

263 stars 32 forks source link

String.prototype.normalize #27

Closed stevenvachon closed 7 years ago

stevenvachon commented 7 years ago

'Sîne klâwen Johan Ã–bert'.normalize('NFKD').replace(/[\u0080-\uF8FF]/g, '');
//-> 'Sine klawen Johan Abert'

thejoshwolfe commented 7 years ago

does this library behave incorrectly without calling normalize()? the table of replacements is supposed to include all the variants regardless of normalization so that the replacement effectively normalizes automatically.

can you give an example string that does not behave correctly? preferably the \uNNNN escaped form of the string so that web browsers don't normalize it in transit.

stevenvachon commented 7 years ago

Not necessarily incorrect, unless the ES2015 spec is what should be considered correct.

There're some differences, for example:

const input = "Iлｔèｒｎåｔïｏｎɑｌíƶａｔï߀ԉ";

input.normalize('NFKD').replace(/[\u0080-\uF8FF]/g, '');
//-> 'Iternationliati'

diacritics(input);
//-> 'Internationalizati0n'

thejoshwolfe commented 7 years ago

i must admit, i have no idea what this issue is about. please make a claim about how this library is deficient or how it should be changed or something. I can't figure out what you're trying to say with your spare code fragments.

stevenvachon commented 7 years ago

I've updated my code snippet above. This issue was created to bring to your attention ES2015's normalize() as a possible improvement to the source code which has followed its current design since 2012.

andrewrk commented 7 years ago

if I understand correctly this is the issue:

Subject: This module is no longer necessary due to new ECMAScript Version

Body:

Check it out, you can do this:

'Sîne klâwen Johan Ã–bert'.normalize('NFKD').replace(/[\u0080-\uF8FF]/g, '');
//-> 'Sine klawen Johan Abert'

So looks like we don't need this module anymore. Options include:

release new version where entire module code just uses String.prototype.normalize as demonstrated
stop maintaining the module and let this issue show people the way

That's my understanding of this issue. I'm gonna choose the second option, obviously. Have fun writing JavaScript everybody.

thejoshwolfe commented 7 years ago

i mean, the proposed algorithm doesn't have the same behavior, so it would just be an alternative way of doing some of what this module does. there may be performance improvements, but without any evidence i don't think the proposal is worthwhile. even with a performance improvement, i'm not sure it'd be worth changing anything.

stevenvachon commented 7 years ago

@andrewrk sorry that you're offended by simple advice. If you don't want inspiration to take the module forward, then it's backwards thinking which doesn't belong anywhere.

@thejoshwolfe fair enough.

andrewrk commented 7 years ago

sorry that you're offended by simple advice. If you don't want inspiration to take the module forward, then it's backwards thinking which doesn't belong anywhere.

I was more joking that I've been spending time working on another language and neglecting js modules lately. Was going for humor rather than offense. Apologies for sounding angry.

I mean it's good to know that this works. I imagine most people can just use this normalize thing instead of this module now. ¯\_(ツ)_/¯

stevenvachon commented 7 years ago

@andrewrk ah, fuggin' text to blame :)

I don't know what situations normalize() would be a limitation. For example, which languages comprise Iлｔèｒｎåｔïｏｎɑｌíƶａｔï߀ԉ and would such ever be a user input string that should be interpreted as Internationalizati0n?

Would it be enough to use normalize() and perhaps TR46-to-ASCII (for chinese/japanese/etc)?