Closed Sjord closed 6 years ago
This works during test but not at runtime, because .*
doesn't match newlines. It works during test because then newlines are stripped from the test file, which I changed in #58.
Fixed. It now matches at most three lines (two line endings). I rebased it so that it correctly uses line endings in the test.
Oh, this still isn't correct. I added another regex group but forgot to increase the group number for the version number.
Looking at both examples you provided, it seems like the major things that indicate a version string are "Roxen CMS", "|", and "version 123.4.5.6". Here is a regex that matches both test URLs:
Roxen CMS[\S\s]|[\S\s]version ([\d.]+)
Is this version number always found within HTML text? To reduce false positives maybe you could maybe add the > and < for some additional context. Something like:
Roxen CMS<[\S\s]|[\S\s]version ([\d.]+)\s?<
Examples:
The text and pipe character don't always have the same styling. I want to make this regex generic, but I also worry that
.*
may be too loose. Any thoughts on this? Maybe just.{1,200}
or some reasonable numbers?