False Positive Rules Search when rule sequence is a proper suffix of the sequence of another rule

QKekos commented 5 months ago

Core issue:

with just those two rules: ⎵i☆ ⇒ I and i☆ ⇒ ing buffer ⎵ing triggers the rule search, it wants it to be typed with i☆, which is impossible because of the first rule

So a while ago we had the same issue for ex* -> example and x* -> xt, so buffer ext was triggering the second rule, which is wrong, but that fix ignores wordbreak character for some reason

^ On a second thought, we don't trigger rule search on a sequence token press at all(we should!), so it might be still there

Ikcelaks commented 5 months ago

This issue also affects the new rules search.

_My proposed fix is to store the rule sequence in the trie payload for only rules that are exact suffixes of other rules, and use it to do the following:

Duplicate the key_buffer to a scratch key buffer.
Pop off the correct number of keys to go back in time to where that matching transform was about to start.
Push the sequence of the matching rule onto the scratch buffer.
Recheck for the transform on the scratch buffer. If it really produces the output, we have a match. otherwise, it was a false match invalidated by an overlapping longer match.

This solution is obviously non-trivial. I do not intend to implement it immediately, but I want to propose it for comment.

Performance of this algorithm is not an issue. And I don't think the ROM space requirements would be seriously affected either, because few rules are likely to be proper suffixes of other rules, and those rules would tend to be very short. The memory cost of duplicating the buffer is also probably not a huge deal. The longest_matching_chain function would need to be run once for each key in the sequence, but a sequence that is a suffix of another sequence is likely to be very short, and that function is very fast. Complexity is the problem._

Hopefully a simpler solution is found.

ETA: Oooh! Here is a solution! If the transform is from a rule that is the suffix of one or more other rule sequences, we embed a child trie that continues the search to find the longer sequences. This child trie would not need any payload, and would usually be extremely small. This solution is still complex, but it is so much less kludgy. And the search remains a single pass linear search.

Ikcelaks commented 5 months ago

I'm doing a second comment, because the first is becoming a running stream of my thoughts.

I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.

QKekos commented 5 months ago

I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.

That's very much not what should be done at all, we have to check if that rule superset is in the buffer too

Ikcelaks commented 5 months ago

I also propose that the temporary fix be to disable missing rules detection on any rule whose sequence is a proper suffix of another rule.

That's very much not what should be done at all, we have to check if that rule superset is in the buffer too

Understood, but that proper solution is HARD. I was just proposing that hiding a rule from rules search is better than having false positives in the meantime.

kamih commented 5 months ago

I have a potential fix for this issue with our current algo in rule_search_max_seq_len branch. From testing with example rules and my own rules, I'm no longer seeing any false positive (partial) matches. Please test with your rules when you get a chance, and report here which are failing, and if you're still getting false positives.

Here's what I changed: st_check_rule_match now stores seq len in search_max_seq_len if the visited seq (stored in stack) matches the search window. A full match can now only be considered valid if its sequence len is >= search_max_seq_len (reset for each search window). This prevents reporting of rules that do match the buffer, but that could not have actually been used with the current total search buffer (another rule exists for which the seq shares a suffix with the rule under consideration).

Ikcelaks commented 5 months ago

I have implemented a fix in my proof of concept code for RS_new. The solution is to link the match node of a transform in the transform trie to the corresponding match node in the original trie. Then, we check that original trie match node to see if it is a leaf (no branch bit set). If it is, there is no prefix, and the match is valid. If the branch bit is set, we step forward and attempt to find a match in the original trie using the remaining keys in the cursor. If we do find a match, then we know there was a conflicting prefix that invalidates our transform match, because the matching rule would not have been fired.

I don't expect anyone to really understand the previous paragraph without having seen the code. Just know that the problem should be resolved.

Ikcelaks / qmk_sequence_transform

False Positive Rules Search when rule sequence is a proper suffix of the sequence of another rule #50