Closed Anders429 closed 3 years ago
This might not be a sufficient bug fix for repeated characters. It makes the following fail, while it passed previously:
use word_filter::WordFilterBuilder;
let filter = WordFilterBuilder::new().words(&["foo", "bar"]).aliases(&[("a", "A")]).build();
assert_eq!(filter.censor("fbAaaAaAar"), "f*********");
The matching needs to be able to deal with aliases that replace a single character (it probably should be ones that replace a single grapheme, but that's a separate issue completely). Instead of storing the matched character, the Walker
should store the target Node
. Then, when we know we are in a subgraph, it should only return if the return Node
is equal to the target Node
.
That will require keeping track of just a bit more state in the Walker
struct, but it should completely resolve this issue (aside from the graphemes thing, which will be dealt with later).
Merging #15 (ebcdec8) into master (faca6c0) will decrease coverage by
1.10%
. The diff coverage is89.10%
.
@@ Coverage Diff @@
## master #15 +/- ##
==========================================
- Coverage 90.12% 89.02% -1.11%
==========================================
Files 4 5 +1
Lines 486 829 +343
==========================================
+ Hits 438 738 +300
- Misses 48 91 +43
Impacted Files | Coverage Δ | |
---|---|---|
src/lib.rs | 85.86% <83.80%> (-1.17%) |
:arrow_down: |
src/walker.rs | 90.95% <89.94%> (-7.48%) |
:arrow_down: |
src/censor.rs | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update faca6c0...ebcdec8. Read the comment docs.
The above issue is resolved. I had to rewrite Walker
to get it to store callback nodes and target nodes to allow repeated characters to work for aliases. I also introduced a WalkerBuilder
, which fixes #16.
This is good to go. Fixes lots of bugs and presents significant improvements internally, making the code more maintainable.
Fixes #12, #13, and #14.
This changes censors from an enum to a function pointer. Also fixes #14 by making repeated character matching pay attention to the previous character.