WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
914 stars 122 forks source link

Rework pattern matching to use glob-style patterns #171

Open melsabagh opened 2 years ago

melsabagh commented 2 years ago

BREAKING CHANGE: This PR changes the signature and behavior of the keys, values, and items methods of the Automaton class to process the optional pattern parameter as a Unix shell glob-style pattern.

Currently only the '?' (match any single character) and '*' (match zero or more characters) wildcards are supported (so no character classes nor ranges). The '\' character is reserved as the wildcard escape character (so any literal '\' in a pattern must itself be escaped as '\\\\' or r'\\').

The optional wildcard parameter is gone. The values for the optional how parameter can be one of MATCH_PREFIX (default; prefix match on keys) and MATCH_WHOLE (full match on keys).

pombredanne commented 1 year ago

@melsabagh-kw Let me review this in details... but I will not merge this in 2.0.0, rather this would be considered after since we had betas out and are about to release the final 2.0.0