Woundorf / foxreplace

Replace text in webpages
https://addons.mozilla.org/firefox/addon/foxreplace/
GNU General Public License v3.0
88 stars 21 forks source link

Regex for the asian languages does not work, e.g. (?<!\p{Han}) or (?!\p{Lo}) #373

Open iG8R opened 5 months ago

iG8R commented 5 months ago

Sometimes I need to replace Asian characters when they surrounded only with Western ones. To do this, I'm trying to use Lookbehind and Lookahead constructions, e.g. (?<!\p{Han}) or (?!\p{Lo}), but they don't work in FoxReplace at all, although everything is fine when I check them, for example, on https://regex101.com.

image

image

image

image

Woundorf commented 5 months ago

This is because I don't use any of the Unicode flags when creating the RegExp object, and they are needed to support these \p{...} character classes (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape). Maybe they should be used always or with an option (like the case sensitivity), but I didn't know about these flags until relatively recently.

Could you please create another issue asking for unicode support, with a link to this one as example?

In the meantime, as a workaround you could replace with a function where you test if the found text actually matches the correct regexp and then return the replaced string and otherwise the unmodified string.

iG8R commented 5 months ago

Thanks a lot for your attention and advice, I've already tried it and it is too cumbersome to use the function in this case.

iG8R commented 5 months ago

Maybe there is also a flag that make it possible to use the change capitalization escape in a substitution equation, like the following on https://stackoverflow.com/a/33351224/6773436:

  1. Capitalize words

    Find: (\s)([a-z]) (\s also matches new lines, i.e. "venuS" => "VenuS") Replace: $1\u$2

  2. Uncapitalize words

    Find: (\s)([A-Z]) Replace: $1\l$2

  3. Remove camel case (e.g. cAmelCAse => camelcAse => camelcase)

    Find: ([a-z])([A-Z]) Replace: $1\l$2

  4. Lowercase letters within words (e.g. LowerCASe => Lowercase)

    Find: (\w)([A-Z]+) Replace: $1\L$2 Alternate Replace: \L$0

  5. Uppercase letters within words (e.g. upperCASe => uPPERCASE)

    Find: (\w)([A-Z]+) Replace: $1\U$2

  6. Uppercase previous (e.g. upperCase => UPPERCase)

    Find: (\w+)([A-Z]) Replace: \U$1$2

  7. Lowercase previous (e.g. LOWERCase => lowerCase)

    Find: (\w+)([A-Z]) Replace: \L$1$2

  8. Uppercase the rest (e.g. upperCase => upperCASE)

    Find: ([A-Z])(\w+) Replace: $1\U$2

  9. Lowercase the rest (e.g. lOWERCASE => lOwercase)

    Find: ([A-Z])(\w+) Replace: $1\L$2

  10. Shift-right-uppercase (e.g. Case => cAse => caSe => casE)

    Find: ([a-z\s])([A-Z])(\w) Replace: $1\l$2\u$3

  11. Shift-left-uppercase (e.g. CasE => CaSe => CAse => Case)

    Find: (\w)([A-Z])([a-z\s]) Replace: \u$1\l$2$3

Woundorf commented 5 months ago

This is not possible in JavaScript without using a custom function. The only recognized special strings in JavaScript are listed here.

The examples listed in the Stack Overflow answer are for Sublime Text, which according to another comment relies on Boost, which following the links it seems that supports the same things as Perl.

iG8R commented 5 months ago

Thanks a lot for the clarification. It is so pity.