Modify how pattern search works (support variables and regexes)

david-a-wheeler commented 1 year ago

When I do a "search" I can enter a pattern (great!). In addition, when I use patterns like 4 = or T. <-> I get very sensible results.

However, when I do some pattern searches I don't get the results I expect. Examples of surprises are ph <-> and class. In the first case, I get results that don't include the phrase ph <->. In the second, I seem to get everything that has a class type result in it... not everything with the constant term "class".

I think it'd be great to explain the pattern language where it's available. E.g., as a tooltip, or maybe a circled "i" for information that displays a longer explanation).

Also, it might be better if the pattern language metasymbols (like class) were unlikely to be math symbols in use. One way is to use "&" as a prefix, e.g., &class.

Also, it might be nice to support regular expressions as an option (add a checkbox "use regex"). Regexes are a widely-used pattern language that many people already know. Regular expressions don't know anything about the types of data (while this pattern search system does appear to), so perhaps they shouldn't be the default, but it'd be nice to have the option to use regexes for search. Since I don't fully understand the pattern language, I don't know what (or if) I'm missing :-).

expln commented 1 year ago

The pattern language is very simple, I didn't spend a lot of time for its implementation. So definitely this is the area for possibly many improvements.

A pattern should consist of constants only and it describes what subsequence you want to find in assertions (currently it doesn't search over hypotheses, only the conclusion part of axioms and theorems). Let's say your pattern consists of 5 constants c1 c2 c3 c4 c5 and the algorithm has already found a theorem having c1 c2 as a subsequence in it, now the algorithm is trying to check if c3 is also included into this theorem, so it works as follows (with some simplifications):

c3_matched = false
for theorem_symbol in "all-remaining-unmatched-symbols-to-the-right" and while !c3_matched {
    if (theorem_symbol is constant) {
        c3_matched = (theorem_symbol === c3)
    } else {
        c3_matched = ( typeOf(theorem_symbol) === c3 )
    }
}
if (!c3_matched) {
    "conclude the theorem doesn't match the pattern"
} else {
    "proceed with the same for c4"
}

As you can see it doesn't (currently) use metasymbols.

In the case when for ph <-> you get results that don't include the phrase ph <-> I suspect you saw the results of the previous search and you didn't notice the error message "'ph' - is not a constant." :) If this is the case then yeah, that automatically becomes one subarea for improvement for the error notification :) (I will erase results of the previous search)

I provided an explanation of current implementation of the pattern search. I will provide more thoughts on improvements a bit later.

Meanwhile could you please provide few real examples of what you might want to find but it is not possible with current implementation? Obviously there are already two cases: 1) search over hypotheses; 2) for example, the pattern class -> class will not work well if you wanted to find cases when both class variables are actually the same variable. What could be other real use cases? This will help me to come up with more useful ideas.

david-a-wheeler commented 1 year ago

Interesting, that algorithm is different from what I expected. The constant matching in sequence, while allowing intervening symbols, seems very reasonable. However, I didn't expect it to only work with constants. If I enter "ph" I would expect the search to do something useful, not to produce an error.

I suggest tweaking this search algorithm a little bit. If a variable is seen, determine its type, and match a symbol with the same type. E.g., ph will match ph or ps, but not C. That means it is USING the search query it gets. It should be easy to do lookups on each symbol type, so that should be pretty quick.

I don't know if enforced mapping is important. I doubt it, but sounds like experimentation is needed. If it can match on types that is probably good enough.

I still think supporting alternative search syntaxes would be useful, at least the regex syntax.

david-a-wheeler commented 1 year ago

I don't normally search for a specific hypothesis. I don't think you need to worry about that.

david-a-wheeler commented 1 year ago

Okay. I'll modify the user guide to explain how the algorithm works, completing the "clarify" part. I'll change this issue to focus on changing how pattern search works. In particular, I'd like to see:

Allow non-constants in the pattern. They should match any symbol of the same type.
Allow the use of regexes instead (e.g., as a checkbox). JavaScript already supports regex searching, so it'd be easy to implement.

expln commented 1 year ago

Sure, I will add variables and regex.

david-a-wheeler commented 1 year ago

Awesome!

I had a further thought regarding variables. When matching variables, it would be really snappy if patterns supported this construct:

If a pattern repeats a variable, a matching statement must repeat the matching variable. For example, if the pattern is "ph -> ph", it will match "ph -> ph" and "ps -> ps" but not "ph -> ps". Note that there can still be intervening symbols between the symbols in the pattern.

This construct should be easy to implement. While processing the pattern, if there's a match on a variable, record the mapping and require that mapping when processing the rest of the statement. Once it's been determined if a statement matches (or not), throw away the mappings.

If a user wants arbitrary matches on variables, the user can simply specify different variable names. So "A + B" would match more statements than "A + A".

expln commented 1 year ago

That's exactly what I am going to implement.

expln commented 10 months ago

Support of variables in pattern search is implemented on dev. It works as described in the comment above. Regex search remains to be implemented.

expln / metamath-lamp

Modify how pattern search works (support variables and regexes) #13