HOL-Theorem-Prover / HOL

Canonical sources for HOL4 theorem-proving system. Branch develop is where “mainline development” occurs; when develop passes our regression tests, master is merged forward to catch up.
https://hol-theorem-prover.org
Other
621 stars 140 forks source link

Implement basic regex search functionality for DB.find #1246

Closed mrichards30 closed 2 months ago

mrichards30 commented 4 months ago

Hi, I've implemented some regex search support for DB.find based on issue #297 with a simple precedence parser.

The grammar supports the previous two operators, in their same order of operations:

  1. Searching by intersection with ~ -- e.g. DB.find "FORALL~THM"
  2. Searching for the union of two queries with | -- e.g. DB.find "FORALL|EVERY"

And newly implements the following

  1. Brackets/precedence -- e.g. DB.find "(LEFT|RIGHT)_FORALL_OR_THM"
  2. An optional element -- e.g. DB.find "LEFT_AND_FOR(_?)ALL"

I guess this shouldn't have any backwards compatibility impacts since the new characters would all be illegal in a theorem name.

Examples:

image
mn200 commented 3 months ago

I don't understand how uexi(s~s)_?s(i|e) matches UEXISTS_SIMP. Does the translation of ~ insert .* on the front and back of the pattern? The code that translates Twiddle a b seems to just put Any between the a and b not before and after...

mn200 commented 3 months ago

The DBSearchParser should also export the entry-point that turns the string into a regular expression.

mrichards30 commented 3 months ago

Hi Michael, From the linked ticket I thought we wanted to match the definition given by re1 ~~ re2 == re1.*re2 | re2.*re1 There is an any put on the start and back of every pattern just so every search acts as a 'contains' check rather than requiring a total match

In the example, the (s~s) part is just consuming the "sts" of "exists" by taking the t as the middle any.

Sorry if I'm misunderstanding

mrichards30 commented 3 months ago

The DBSearchParser should also export the entry-point that turns the string into a regular expression.

Oh ok thanks, just checking do you mean it should export my regexp type (i.e., parse_regexp), or the existing regexp type from regexpMatch, i.e., translate_regexp o parse_regexp

mn200 commented 3 months ago

Exporting both translate_regexp and parse_regexp along with your own type seems perfectly reasonable to me.

(And you're quite right about your ~ example; brain fade on my part.)

mrichards30 commented 3 months ago

Ok thanks! Only other change was to rename my regexp type, since it apparently didn't like me exporting another type named regexp.

mn200 commented 3 months ago

Could you put the datatype declaration for search_regexp into the signature please? Having it be completely opaque with just a type declaration makes it difficult/impossible to do anything with it.

mrichards30 commented 3 months ago

Oops I didn't realise that's how that would work -- I've put it in so the same declaration is in the sig and sml files now, would that be right?

mn200 commented 3 months ago

Yes, that's right. You can avoid the "code smell" by declaring the datatype in a structure (a .sml file conventionally), and then referring to that declaration in both the signature and the broader structure, but just repeating it is fine. (See various _dtype.sml files and their uses scattered around the HOL sources.)

mrichards30 commented 2 months ago

Ohh ok thanks, I remember seeing dtype files around but didn't realise what they were for

mn200 commented 2 months ago

Thanks for your work on this!

mn200 commented 2 months ago

DB.find "and~cl" now goes into an infinite loop; do you have time to look into this?