Closed p5pRT closed 15 years ago
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }' % perl -E 'if (1) { /[$]/ }' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at -e line 1.
It would be more useful to check them at compile time.
On Wed May 06 06:29:09 2009\, eda@waniasset.com wrote:
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }' % perl -E 'if (1) { /[$]/ }' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at -e line 1.
It would be more useful to check them at compile time.
They are checked at compile time if your pattern is constant\, but this pattern includes a variable.
If you meant to report that regexps containing read-only magic variables should be compiled at compile time that would require big changes to the compiler\, and I don't see why it would be worth the effort honestly.
The RT System itself - Status changed from 'new' to 'open'
2009/5/6 Ed Avis \perlbug\-followup@​perl\.org:
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }' % perl -E 'if (1) { /[$]/ }' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at -e line 1.
It would be more useful to check them at compile time.
Since regexps can embed variables\, like $] here\, that's not possible.
@rgs - Status changed from 'open' to 'rejected'
On Thu\, May 07\, 2009 at 10:59:22AM +0200\, Rafael Garcia-Suarez wrote:
2009/5/6 Ed Avis \perlbug\-followup@​perl\.org:
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }' % perl -E 'if (1) { /[$]/ }' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at -e line 1.
It would be more useful to check them at compile time.
Since regexps can embed variables\, like $] here\, that's not possible.
And to make the point more clear\, if there are no variables\, the regexp is compiled at program compile time:
$ perl -cE '/[$]/' -e syntax OK $ perl -cE '/[5.010/' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010/ at -e line 1.
Abigail
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }'
They are checked at compile time if your pattern is constant\, but this pattern includes a variable.
Duh\, of course. I had intended to escape the $ character by putting it in a character class\, but it doesn't work like that.
It would be useful\, IMHO\, to have a warning when magic variables such as $] are used in a regular expression\, because most magic characters (including $ itself) already have a meaning in regexps and it is easy to get confused. If $] or $. or whatever is interpolated into a regexp\, that is probably a mistake. But adding such a warning would be a separate bug report.
-- Ed Avis \eda@​waniasset\.com
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
On Thu\, May 7\, 2009 at 4:27 AM\, Ed Avis \eda@​waniasset\.com wrote:
[the wishlist item to add] such a warning would be a separate bug report.
nonsense. The subject here is "REs should be checked" not "REs should be compiled" and the problem you had is a perfect example of why warning on unusual variables appearing in RE bodies would be a good thing.
I agree that it sounds like a good thing.
Ed Avis wrote:
It appears that regexps aren't syntax-checked until run time:
% perl -E 'if (0) { /[$]/ }'
They are checked at compile time if your pattern is constant\, but this pattern includes a variable.
Duh\, of course. I had intended to escape the $ character by putting it in a character class\, but it doesn't work like that.
It would be useful\, IMHO\, to have a warning when magic variables such as $] are used in a regular expression\, because most magic characters (including $ itself) already have a meaning in regexps and it is easy to get confused. If $] or $. or whatever is interpolated into a regexp\, that is probably a mistake. But adding such a warning would be a separate bug report.
-1
Yes\, I might want to check if the current version of Perl or the current line number is in a string and I don't want to have to stick it into a temp variable before I do it.
-- 29. The Irish MPs are not after "Me frosted lucky charms". -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
On Thu\, May 7\, 2009 at 8:43 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
Ed Avis wrote:
It would be useful\, IMHO\, to have a warning when magic variables such as $] are used in a regular expression\, because most magic characters (including $ itself) already have a meaning in regexps and it is easy to get confused. If $] or $. or whatever is interpolated into a regexp\, that is probably a mistake. But adding such a warning would be a separate bug report.
-1
Yes\, I might want to check if the current version of Perl or the current line number is in a string and I don't want to have to stick it into a temp variable before I do it.
You wouldn't want to use $] without using \Q. Maybe the warning should be conditional on not using \Q?
I don't see much value in this warning. The situation where it occurs is pretty rare and pretty easy to debug.
Eric Brine wrote:
On Thu\, May 7\, 2009 at 8:43 PM\, Michael G Schwern \<schwern@pobox.com \mailto​:schwern@​pobox\.com> wrote:
Ed Avis wrote​: > It would be useful\, IMHO\, to have a warning when magic variables such as $\] are > used in a regular expression\, because most magic characters \(including $ itself\) > already have a meaning in regexps and it is easy to get confused\. If $\] or $\. or > whatever is interpolated into a regexp\, that is probably a mistake\. But adding > such a warning would be a separate bug report\. \-1 Yes\, I might want to check if the current version of Perl or the current line number is in a string and I don't want to have to stick it into a temp variable before I do it\.
You wouldn't want to use $] without using \Q. Maybe the warning should be conditional on not using \Q?
I don't see much value in this warning. The situation where it occurs is pretty rare and pretty easy to debug.
On the contrary\, I DO see the value in the proposed warning. It's easy to screw up [$] and can be confusing for a newbie to figure out what the hell is going on. Trouble is\, I also see value in using $] in a regex.
Maybe what we need is a clearer error message.
$ perl -wle '/[$]/' Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at -e line 1.
First off\, it's hard to see what's the regex in all that\, especially with the double "\<-- HERE" Simply quoting the regex for emphasis helps a little.
Unmatched [ in regex; marked by \<-- HERE in 'm/[ \<-- HERE 5.010000/' at -e line 1.
Maybe the HERE marker can be made more obvious so we can remove the distracting explanation.
Unmatched [ in regex; 'm/[' \<==HERE '5.010000/' at -e line 1.
How's that?
-- 151. The proper way to report to my Commander is "Specialist Schwarz\, reporting as ordered\, Sir" not "You can't prove a thing!" -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
On Fri\, May 8\, 2009 at 1:37 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
Maybe the HERE marker can be made more obvious so we can remove the distracting explanation.
Unmatched [ in regex; 'm/[' \<==HERE '5.010000/' at -e line 1.
How's that?
How strong is the taboo against warnings taking two lines? Or even needing the HERE notifier?
Unmatched [ in regex m/[5.010000/
seems clearer\, as would
Unmatched [ in regex m/[5.010000/ ^
David Nicol wrote:
On Fri\, May 8\, 2009 at 1:37 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
Maybe the HERE marker can be made more obvious so we can remove the distracting explanation.
Unmatched [ in regex; 'm/[' \<==HERE '5.010000/' at -e line 1.
How's that?
How strong is the taboo against warnings taking two lines?
I'm fine with that.
Unmatched [ in regex at -e line 1. m/[ \<==HERE 5.010000/
People who scrape /at (.*) line (\d+)\.$/ out of the warning might have a little trouble\, but its trivially and compatibly fixed with a /m. All warning scrapers are vulnerable to formatting changes anyway.
Or even needing the HERE notifier?
Unmatched [ in regex m/[5.010000/
seems clearer\, as would
Unmatched [ in regex m/[5.010000/
(If there's a difference between those two I don't see it)
Which [ was that again?
Unmatched [ in regex /[5.01000[xyz]/
I'm sure you can extrapolate that out into some ginormous regex.
-- 170. Not allowed to "defect" to OPFOR during training missions. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
Quoth schwern@pobox.com (Michael G Schwern):
David Nicol wrote:
On Fri\, May 8\, 2009 at 1:37 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
Maybe the HERE marker can be made more obvious so we can remove the distracting explanation.
Unmatched [ in regex; 'm/[' \<==HERE '5.010000/' at -e line 1.
How's that?
How strong is the taboo against warnings taking two lines?
I'm fine with that.
Unmatched [ in regex at -e line 1. m/[ \<==HERE 5.010000/
I've always likes the TeX methof of identifying an error:
Unmatched [ in regex at -e line 1. m/[ 5.010000/
It clearly shows where the error is without introducing extra words that are easily confused with the rest of the regex. Of course\, it's less clear when /x is involved.
People who scrape /at (.*) line (\d+)\.$/ out of the warning might have a little trouble\, but its trivially and compatibly fixed with a /m. All warning scrapers are vulnerable to formatting changes anyway.
Yes.
Ben
On Fri\, May 8\, 2009 at 6:28 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
(If there's a difference between those two I don't see it)
the ^ on the next line was the suggestion.
On Fri\, May 8\, 2009 at 7:04 PM\, Ben Morrow \ben@​morrow\.me\.uk wrote:
it's less clear when /x is involved.
presumably the warning text would stringify the compiled qr rather than quote the source.
$ perl -le '$z = qr/this #this and #and that/x; print $z' (?x-ism:this #this and #and that)
oh. I'm surprised that the /x stuff survived.
David Nicol wrote:
How strong is the taboo against warnings taking two lines? Or even needing the HERE notifier?
It appears to me that that `splain` reads and explains on a line-by-line basis; this would need to be tweaked to accommodate multi-line warnings.
I have no idea if multi-line warnings would affect the diagnostics pragma. At the very least it may affect how it tries to look up the appropriate warnings in perldiag.
Cheerio\,
Paul
-- Paul Fenwick \pjf@​perltraining\.com\.au | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681
On Fri\, May 08\, 2009 at 11:17:09PM -0500\, David Nicol wrote:
On Fri\, May 8\, 2009 at 6:28 PM\, Michael G Schwern \schwern@​pobox\.com wrote:
(If there's a difference between those two I don't see it)
the ^ on the next line was the suggestion.
^ on the next line is fine as long as the first part doesn't wrap. But as soon as the length of the introductionary text and the regexp itself is longer than the width of the terminal\, it wraps\, and it'll be very hard to place a ^ under the right spot.
Abigail
David Nicol wrote:
On Fri\, May 8\, 2009 at 7:04 PM\, Ben Morrow \ben@​morrow\.me\.uk wrote:
it's less clear when /x is involved.
presumably the warning text would stringify the compiled qr rather than quote the source.
The opposite. The message is about helping the coder find the mistake *in their code* not in a normalized view of their code. The two might look rather different\, though qr// does a rather good job of keeping them the same.
Also since this is a parse error there's not necessarily a compiled qr// yet to display.
-- But there's no sense crying over every mistake. You just keep on trying till you run out of cake. -- Jonathan Coulton\, "Still Alive"
Quoth schwern@pobox.com (Michael G Schwern):
David Nicol wrote:
On Fri\, May 8\, 2009 at 7:04 PM\, Ben Morrow \ben@​morrow\.me\.uk wrote:
it's less clear when /x is involved.
presumably the warning text would stringify the compiled qr rather than quote the source.
The opposite. The message is about helping the coder find the mistake *in their code* not in a normalized view of their code. The two might look rather different\, though qr// does a rather good job of keeping them the same.
Currently the stringification of a qr// is exactly the text that was originally supplied\, modulo the lexer's qqish processing and a possible upgrade to utf8\, with "(?$flags:" on the beginning and ")" on the end. There is no conversion back from a compiled regex to a string.
Ben
My two cents is that while you might conceivably want to use $] in a regexp\, this is very rare and certainly dwarfed by the number of occasions when the programmer made a mistake and intended something else. If I really did want that\, it would be no hardship to put $] into a named variable first.
The same applies for every magic variable with the possible exception of $_.
So I think it a good idea to add a warning (not a fatal error breaking compatibility; just a warning) when it appears that magic punctuation vars like $$\, $\<\, $/ and so on and so on are being used in a regular expression.
The enhancements discussed to error reporting would also be a good idea.
-- Ed Avis \eda@​waniasset\.com
On Mon\, May 11\, 2009 at 10:38:43AM +0000\, Ed Avis wrote:
My two cents is that while you might conceivably want to use $] in a regexp\, this is very rare and certainly dwarfed by the number of occasions when the programmer made a mistake and intended something else. If I really did want that\, it would be no hardship to put $] into a named variable first.
The same applies for every magic variable with the possible exception of $_.
So I think it a good idea to add a warning (not a fatal error breaking compatibility; just a warning) when it appears that magic punctuation vars like $$\, $\<\, $/ and so on and so on are being used in a regular expression.
IMO\, since the majority of such cases can be determined by inspecting the source of the program\, such a warning seems to be an excellent candidate for Perl::Critic.
Abigail
On Mon\, May 11\, 2009 at 12:45:35PM +0200\, Abigail wrote:
On Mon\, May 11\, 2009 at 10:38:43AM +0000\, Ed Avis wrote:
So I think it a good idea to add a warning (not a fatal error breaking compatibility; just a warning) when it appears that magic punctuation vars like $$\, $\<\, $/ and so on and so on are being used in a regular expression.
IMO\, since the majority of such cases can be determined by inspecting the source of the program\, such a warning seems to be an excellent candidate for Perl::Critic.
The minority being run-time generated code? I can't conceive of a situation where it would be possible to get one in undetected by static analysis\, given that there is only one stage of variable interpolation into regexps.
Nicholas Clark
Abigail wrote:
So I think it a good idea to add a warning (not a fatal error breaking compatibility; just a warning) when it appears that magic punctuation vars like $$\, $\<\, $/ and so on and so on are being used in a regular expression.
IMO\, since the majority of such cases can be determined by inspecting the source of the program\, such a warning seems to be an excellent candidate for Perl::Critic.
I thought about that but considering when such a warning would be useful to the programmer\, it makes more sense to report it before the error from running the regex. So the programmer would see
Possible unintended interpolation of special variable $[ in regex at prog line 3. Unmatched [ in regex; marked by \<-- HERE in m/[ \<-- HERE 5.010000/ at prog line 3.
At the moment only the second error is shown and it's not obvious that the next step is 'run perlcritic over your code to give a more helpful message'.
I think this is analogous to the 'Possible unintended interpolation of @whatever in string' warning\, which is also mostly decidable by looking at the program source\, but is simply more helpful when given by perl itself rather than left to Perl::Critic.
-- Ed Avis \eda@​waniasset\.com
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
Ed Avis wrote:
I think this is analogous to the 'Possible unintended interpolation of @whatever in string' warning\, which is also mostly decidable by looking at the program source\, but is simply more helpful when given by perl itself rather than left to Perl::Critic.
Now that brings up an interesting point. That warning doesn't fire every time you interpolate @foo\, just the ones that are most likely to be mistakes\, like "foo@bar.com". This keeps it from rendering a useful feature useless and elevates it to a useful warning that's actually catching a mistake.
So perhaps a warning about $] can only fire when its inside an unclosed [. There's little legit use for that. That is what we're trying to defend against. All the rest are fine.
As for the other special variables\, come up with a common mistake to warn about and a heuristic to catch it. That will be far more effective than a blanket warning and help the noobs without annoying the power users.
-- Reality is that which\, when you stop believing in it\, doesn't go away. -- Phillip K. Dick
So perhaps a warning about $] can only fire when its inside an unclosed [.
That seems fine to me.
-- Ed Avis \eda@​waniasset\.com
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
Migrated from rt.perl.org#65438 (status was 'rejected')
Searchable as RT65438$