Anders429 / word_filter

A Word Filter for filtering text.
Apache License 2.0
1 stars 0 forks source link

AllowRepeatedCharacters does not work with separators in between #6

Closed redzic closed 3 years ago

redzic commented 3 years ago

Hi, if you set RepeatedCharacterMatchMode::AllowRepeatedCharacters it does not seem to catch repeated characters with separators in between. For example:

WordFilter::new(
    &[
        "test",
    ],
    &[],
    &[
        ",", "_", ".", ",", "-", "+", "&", "[", "]", ":", ";", "!", "@", "#", "$", "%",
        "*", "(", ")", "<", ">",
    ],
    &[],
    Options {
        repeated_character_match_mode: RepeatedCharacterMatchMode::AllowRepeatedCharacters,
        ..Default::default()
    },
)

does not match t e e s t, but it does match teest, and t,_e@s%t, which seems a little inconsistent.

Thanks for this project btw!

Anders429 commented 3 years ago

Hey, thanks for the issue! You're right, this does seem to be a bug that I missed in testing.

Your example doesn't actually include a " " in the separators list, but even including it still results in the same issue. This doesn't pass, and I would expect it to:

#[test]
fn issue() {
    let filter = WordFilter::new(
        &[
            "test",
        ],
        &[],
        &[
            " ",
        ],
        &[],
    Options {
        repeated_character_match_mode: RepeatedCharacterMatchMode::AllowRepeatedCharacters,
        ..Default::default()
    }
    );

    assert!(filter.check("t e e s t"));
}

The issue seems to be that when traversal handles separators, it only pushes the pointer's current_node to the return_nodes Vec (see here), when it should also branch another Pointer that also pushes the previous pointer's current_node to the return_nodes Vec.

Looks like a pretty simple fix. Thanks again for finding it :)

Anders429 commented 3 years ago

Should be fixed now. I have published the change as version 0.2.0 on crates.io.

Feel free to reopen if this doesn't resolve the issue. Thanks again!