Anders429 / word_filter

A Word Filter for filtering text.
Apache License 2.0
1 stars 0 forks source link

`CensorMode::ReplaceAllWith` should replace graphemes #13

Closed Anders429 closed 3 years ago

Anders429 commented 3 years ago

Consider the following example:

use word_filter::WordFilterBuilder;

let filter = WordFilterBuilder::new().words("ã").build();

assert_eq!(filter.censor("ã"), "*");

This fails, since "ã" is technically two characters, but visually appears to be one. The default functionality should be changed to censor with regard to graphemes instead of characters.

It may be easiest to wrap this together with #12, since expanding the configurability of censoring will make this easy to do for users. Alternatively, a separate value for CensorMode could be added. Something like ReplaceGraphemesWith.

The downside of supporting this is having to pull in a dependency like unicode_segmentation, increasing the dependency tree's size. It could work better as an optional feature or (#12 pending) a separate crate providing a Censor that supports grapheme replacement.