This work mainly involves modifying the implementation of like in tsfile to distinguish it from regexp and optimize it.
This implementation adds two core matching classes, LikePattern and LikeMatcher, as well as three matching methods, FjsMatcher, NFA, and DFA.
Matcher Overview
FjsMatcher: This uses KMP string search, which also uses BM's bad character rules to cut off some cases and speed up the string search process
NFA: Construct the corresponding NFA state machine according to the pattern, use the NFA state machine to match the input string, and there is backtracking
DFA: After constructing the NFA according to the pattern, simplify it into DFA, and use the DFA state machine to match the input string, and there is no backtracking
The main process is as follows:
Pattern parsing:
Escape character processing
Statistical information calculation, such as min and max, suffix and prefix
Fuzzy suffix judgment
Select matchers based on data features: only contains % and characters, use string search FjsMatcher; when it contains _, use the state machine, NFA is the default state, and DFA is the optimized state
Matching process
Filter out strings that are not within the length range according to the statistical information min, max
Filter out strings that do not meet the prefix and suffix information according to the statistical information suffix, prefix
This work mainly involves modifying the implementation of like in tsfile to distinguish it from regexp and optimize it.
This implementation adds two core matching classes, LikePattern and LikeMatcher, as well as three matching methods, FjsMatcher, NFA, and DFA.
Matcher Overview
The main process is as follows: