Open QKekos opened 5 months ago
This issue also affects the new rules search.
_My proposed fix is to store the rule sequence in the trie payload for only rules that are exact suffixes of other rules, and use it to do the following:
This solution is obviously non-trivial. I do not intend to implement it immediately, but I want to propose it for comment.
Performance of this algorithm is not an issue. And I don't think the ROM space requirements would be seriously affected either, because few rules are likely to be proper suffixes of other rules, and those rules would tend to be very short. The memory cost of duplicating the buffer is also probably not a huge deal. The longest_matching_chain
function would need to be run once for each key in the sequence, but a sequence that is a suffix of another sequence is likely to be very short, and that function is very fast. Complexity is the problem._
Hopefully a simpler solution is found.
ETA: Oooh! Here is a solution! If the transform is from a rule that is the suffix of one or more other rule sequences, we embed a child trie that continues the search to find the longer sequences. This child trie would not need any payload, and would usually be extremely small. This solution is still complex, but it is so much less kludgy. And the search remains a single pass linear search.
I'm doing a second comment, because the first is becoming a running stream of my thoughts.
I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.
I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.
That's very much not what should be done at all, we have to check if that rule superset is in the buffer too
I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.
That's very much not what should be done at all, we have to check if that rule superset is in the buffer too
Understood, but that proper solution is HARD. I was just proposing that hiding a rule from rules search is better than having false positives in the meantime.
I have a potential fix for this issue with our current algo in rule_search_max_seq_len
branch.
From testing with example rules and my own rules, I'm no longer seeing any false positive (partial) matches.
Please test with your rules when you get a chance, and report here which are failing, and if you're still getting false positives.
Here's what I changed:
st_check_rule_match
now stores seq len in search_max_seq_len
if the visited seq (stored in stack) matches the search window. A full match can now only be considered valid if its sequence len is >= search_max_seq_len
(reset for each search window). This prevents reporting of rules that do match the buffer, but that could not have actually been used with the current total search buffer (another rule exists for which the seq shares a suffix with the rule under consideration).
I have implemented a fix in my proof of concept code for RS_new. The solution is to link the match node of a transform in the transform trie to the corresponding match node in the original trie. Then, we check that original trie match node to see if it is a leaf (no branch bit set). If it is, there is no prefix, and the match is valid. If the branch bit is set, we step forward and attempt to find a match in the original trie using the remaining keys in the cursor. If we do find a match, then we know there was a conflicting prefix that invalidates our transform match, because the matching rule would not have been fired.
I don't expect anyone to really understand the previous paragraph without having seen the code. Just know that the problem should be resolved.
Core issue:
⎵i☆ ⇒ I
andi☆ ⇒ ing
buffer⎵ing
triggers the rule search, it wants it to be typed withi☆
, which is impossible because of the first ruleSo a while ago we had the same issue for
ex* -> example
andx* -> xt
, so bufferext
was triggering the second rule, which is wrong, but that fix ignores wordbreak character for some reason^ On a second thought, we don't trigger rule search on a sequence token press at all(we should!), so it might be still there