Anders429 / word_filter

A Word Filter for filtering text.
Apache License 2.0
1 stars 0 forks source link

Customizable Censors and Better Repeated Character Matching #15

Closed Anders429 closed 3 years ago

Anders429 commented 3 years ago

Fixes #12, #13, and #14.

This changes censors from an enum to a function pointer. Also fixes #14 by making repeated character matching pay attention to the previous character.

Anders429 commented 3 years ago

This might not be a sufficient bug fix for repeated characters. It makes the following fail, while it passed previously:

use word_filter::WordFilterBuilder;

let filter = WordFilterBuilder::new().words(&["foo", "bar"]).aliases(&[("a", "A")]).build();

assert_eq!(filter.censor("fbAaaAaAar"), "f*********");

The matching needs to be able to deal with aliases that replace a single character (it probably should be ones that replace a single grapheme, but that's a separate issue completely). Instead of storing the matched character, the Walker should store the target Node. Then, when we know we are in a subgraph, it should only return if the return Node is equal to the target Node.

That will require keeping track of just a bit more state in the Walker struct, but it should completely resolve this issue (aside from the graphemes thing, which will be dealt with later).

codecov-commenter commented 3 years ago

Codecov Report

Merging #15 (ebcdec8) into master (faca6c0) will decrease coverage by 1.10%. The diff coverage is 89.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #15      +/-   ##
==========================================
- Coverage   90.12%   89.02%   -1.11%     
==========================================
  Files           4        5       +1     
  Lines         486      829     +343     
==========================================
+ Hits          438      738     +300     
- Misses         48       91      +43     
Impacted Files Coverage Δ
src/lib.rs 85.86% <83.80%> (-1.17%) :arrow_down:
src/walker.rs 90.95% <89.94%> (-7.48%) :arrow_down:
src/censor.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update faca6c0...ebcdec8. Read the comment docs.

Anders429 commented 3 years ago

The above issue is resolved. I had to rewrite Walker to get it to store callback nodes and target nodes to allow repeated characters to work for aliases. I also introduced a WalkerBuilder, which fixes #16.

Anders429 commented 3 years ago

This is good to go. Fixes lots of bugs and presents significant improvements internally, making the code more maintainable.