Open templarundead opened 6 years ago
Hmm… So let’s sort the input words by length ([...word].length
) and then lexicographically?
That happens to work for ghost frost pos
, but with my set of 100+ junk words, sorting from longest to shortest, then by Z-A, happens to generate the shorter RX (though not by much, 594 chars vs. the 608 with ascending length and A-Z). I also tried reversing the characters of each word when sorting A-Z and Z-A but this made no difference.
An interesting optimization problem. It would help if I actually looked at what regexgen was doing under the hood 😛
Example 1
input: ghost frost pos output: ghost|frost|pos expected output: (?:(?:fr|gh)ost|pos) Example 2 input: pos ghost frost output: (?:gh|fr)ost|pos expected output: (?:(?:fr|gh)ost|pos)