get-woke / woke

Detect non-inclusive language in your source code.
https://docs.getwoke.tech
MIT License
457 stars 61 forks source link

Fix maskInlineIgnore to handle non-ascii input #279

Open kian99 opened 6 months ago

kian99 commented 6 months ago

Please check if the PR fulfills these requirements

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) Bug-fix

What is the current behavior? (You can also link to an open issue here) https://github.com/get-woke/woke/issues/278 - Crash when running against non-ascii input.

What is the new behavior (if this is a feature change)? No crash on non-ascii input.

Does this PR introduce a breaking change? (What changes might users need to make due to this PR?) No

Other information: The maskInlineIgnore would be converted to a slice of runes, but the length of a slice of runes is not equal to the length of a string when non-ascii characters are present. In the rune representation, each index element is a 32-bit code-point while len(string) returns the number of bytes in the string. A good reference is https://go.dev/blog/strings

I've opted to use the already existing ignoreRuleRegex to replace the characters that were previously being replaced with null terminators. I've also added some tests for this.