Open DiegoZoracKy opened 8 years ago
what about just exporting replacementList
? Then you can this search in your code, or any other search you might want to do.
Exporting replacementList
would be good too. But just with the list, me, and other developers working on a similar case, would have to create this same code.
Is the same goal of the remove method, instead of just having the list, you have created the method to help. So i thought that it could be good to have this helper in this package. But it's ok if you don't agreed. Do you think that you will update it to export the replacementList
?
i'll have to defer to @andrewrk on this, but in my own opinion, i have to admit, i don't really understand what the function is supposed to be used for. In particular, you lose some information when you concatenate 'AE'
with "\u00C6\u01FC\u01E2"
. What are you going to do with the "group of all possible diacritics" when you get it? If I were going to write documentation for this function, I'd be at a loss to describe what it really does without just describing the code.
Can you give more information on the usecase for this function?
To make an diacritic insensitive RegExp
. Example: I have a text which contains the word 'ação'. Assuming that we are handling some kind of search engine, where the input could be written correctly as 'ação', but also it can have a typo like 'açao', 'acão', etc.
By having the group of diacritics i can easily create a RegExp
like: /a[ccćĉċčçḉƈȼↄ][aaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ]o/i
did you mean /[aaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][ccćĉċčçḉƈȼↄ][aaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][oⓞoòóôồốỗổõṍȭṏōṑṓŏȯȱöȫỏőǒȍȏơờớỡởợọộǫǭøǿꝋꝍɵɔᴑ]/i
? It looks like the function is prepared to look up simple ascii characters as well (o.base == chr
).
isn't there a problem with multi-char diacritics like 'Æ'
? Wouldn't the regex for "Cæsar" fail to match against the string "Caesar"?
how about this function:
function charToRegexPattern(chr) {
for (var i = 0; i < replacementList.length; i++) {
var replacement = replacementList[i];
if (replacement.chars.indexOf(chr) === -1) continue;
if (replacement.base.length > 1) {
// allow the complete multi-char sequence or a literal diacritic character
return '(?:' + replacement.base + '|[' + replacement.chars + '])';
} else {
// allow the ascii char or a literal diacritic character
return '[' + replacement.base + replacement.chars + ']';
}
}
// either already ascii or not a diacritic char
return chr;
}
It's arguably less "general purpose", since it returns strings formatted for regex, but i think it's the only way to make it actually work for multi-char sequences, like "ae".
Yes @thejoshwolfe, i meant exactly like you said on the first RegExp. I just kept it short to give you a simple example.
With the version that i wrote i would use in a case like this:
function toRegExp(str){
return RegExp(str.split('').map(chr => `[${diacritics.find(chr) || chr}]`).join(''), 'gi');
}
let str = 'acaoae1ae';
let strDiacritic = 'açãoae1æ';
// RegExp will be: /[aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][ccⓒćĉċčçḉƈȼꜿↄ][aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][oⓞoòóôồốỗổõṍȭṏōṑṓŏȯȱöȫỏőǒȍȏơờớỡởợọộǫǭøǿꝋꝍɵɔᴑ][aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][eⓔeèéêềếễểẽēḕḗĕėëẻěȅȇẹệȩḝęḙḛɇǝ][1][aeæǽǣ]/gi
// And "str" it will match "strDiacritic"
str.match(toRegExp(strDiacritic))
See that the expected input can be a diacritic, or a base
char, while in your charToRegexPattern
you expects only a diacritic. The base
char would never be "expanded" so it won't work in my example where the input 'acao' should match 'ação'. I wouldn't be able to know what is the possible diacritic for a base
char.
And yes, this version is not handling the input of a diacritic of length > 1.
Hi @andrewrk,
What do you think about this? Right now i'm facing a case where i need to have a group of all possible diacritics from a specific char. I remembered about your great list of diacritics, and that your package is named as 'diacritics', and not something like 'remove-diacritics', so i thought that would be better to extend it with one more method instead of create another package.
I already created the new method:
If you think it is ok, i can send you a pull request.