devongovett / regexgen

Generate regular expressions that match a set of strings
https://runkit.com/npm/regexgen
3.35k stars 100 forks source link

Different output result with different word order #21

Open templarundead opened 6 years ago

templarundead commented 6 years ago

Example 1

input: ghost frost pos output: ghost|frost|pos expected output: (?:(?:fr|gh)ost|pos) Example 2 input: pos ghost frost output: (?:gh|fr)ost|pos expected output: (?:(?:fr|gh)ost|pos)

mathiasbynens commented 6 years ago

Hmm… So let’s sort the input words by length ([...word].length) and then lexicographically?

vegeta897 commented 6 years ago

That happens to work for ghost frost pos, but with my set of 100+ junk words, sorting from longest to shortest, then by Z-A, happens to generate the shorter RX (though not by much, 594 chars vs. the 608 with ascending length and A-Z). I also tried reversing the characters of each word when sorting A-Z and Z-A but this made no difference.

An interesting optimization problem. It would help if I actually looked at what regexgen was doing under the hood 😛