Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

please make new command of regerexpression. #13575

Open p5pRT opened 10 years ago

p5pRT commented 10 years ago

Migrated from rt.perl.org#121160 (status was 'open')

Searchable as RT121160$

p5pRT commented 10 years ago

From endohiro@he.catv-yokohama.ne.jp

$_ = "123"; @​test = $_ =~ m/1|2(*LAST)|3/g; print "@​test\n";

# result = 1 2


(*LAST) is "bleak" of roop. Please make (*LAST) command.

I'am japanese. I cant' understand english\, sorry.

p5pRT commented 10 years ago

From @rjbs

On Mon Feb 03 17​:19​:14 2014\, endohiro@​he.catv-yokohama.ne.jp wrote​:

$_ = "123"; @​test = $_ =~ m/1|2(*LAST)|3/g; print "@​test\n";

# result = 1 2

------------------- (*LAST) is "bleak" of roop. Please make (*LAST) command.

I'am japanese. I cant' understand english\, sorry.

This ticket was sent to the security reports queue. I have moved it to the non-security queue.

-- rjbs

p5pRT commented 10 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @demerphq

On 4 February 2014 21​:24\, Ricardo SIGNES via RT \perlbug\-followup@​perl\.org wrote​:

On Mon Feb 03 17​:19​:14 2014\, endohiro@​he.catv-yokohama.ne.jp wrote​:

$_ = "123"; @​test = $_ =~ m/1|2(*LAST)|3/g; print "@​test\n";

# result = 1 2

------------------- (*LAST) is "bleak" of roop. Please make (*LAST) command.

I'am japanese. I cant' understand english\, sorry.

This ticket was sent to the security reports queue. I have moved it to the non-security queue.

This is an interesting request.

If I understand it right then Endohiro-san is asking us to add a new verb (*LAST) which will allow one to break out of a /g loop.

I think this is a doable\, in that we could make the (*LAST) op set the pos() to the end of string after a successful match.

The open question is would it *also* act like (*ACCEPT)?

Endohiro-san\, what do you expect this to do?

$_= "1a2b3c" @​test = $_ =~ m/(1|2(*LAST)|3)[a-c]/g; print "@​test\n";

This? # 1a 2b

or this​: # 1a 2

Does anybody else have an opinion?

Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From perl5-porters@perl.org

Yves Orton asked​:

Does anybody else have an opinion?

If (*LAST) implicitly sets pos\, then shouldn't (?{pos = ...}) also work? It seems odd to allow the former but not the latter\, yet no doubt the latter is a can of worms.

I think what I'm trying to say is that it seems odd for a pattern to modify the behaviour of the op using it\, especially consid- ering that //g in list context is a kind of loop. So would while(...){ /(*LAST)/ } affect that loop?

I'm just having trouble wrapping my head around the ramifications of this change\, and what it might imply elsewhere; i.e.\, what model we are following.

Does any of that make sense?

p5pRT commented 10 years ago

From @demerphq

On 5 February 2014 06​:38\, Father Chrysostomos \sprout@​cpan\.org wrote​:

Yves Orton asked​:

Does anybody else have an opinion?

If (*LAST) implicitly sets pos\, then shouldn't (?{pos = ...}) also work? It seems odd to allow the former but not the latter\, yet no doubt the latter is a can of worms.

I don't agree there is an equivalence there.

Assuming (*LAST) doesnt imply (*ACCEPT) then (*LAST) is not far off the already legal (?s​:.*\z(*SKIP)).

Also I dont think there should be any problem with setting pos() inside of a match. I think that the regex engine should look at pos() only at the start of a match() and update pos() only at the end of the match.

Actually I struggle with deciding if this​:

$ perl -le'$s="abc"; my @​got= $s=~/(.)(?{ print pos($s) })/g;' 1 2 3

is a bug or not. It is obviously useful\, but implies that pos() is meaningful inside of a match which to me is far more disturbing than a regex verb changing pos at the end of a match. After all *every* match changes pos() at the end of a match. But IMO this is really an entirely separate discussion.

I think what I'm trying to say is that it seems odd for a pattern to modify the behaviour of the op using it\, especially consid- ering that //g in list context is a kind of loop. So would while(...){ /(*LAST)/ } affect that loop?

I think you need to come up with a better example of what worries you\, that snippet is too simple to be used as the basis of discussion.

As I said earlier patterns modify the behaviour of the op using it every time they run. Patterns have side effects and those side effects change the op. After all if the pattern didn't modify the behaviour of the op then you couldn't have a /g match at all.

I'm just having trouble wrapping my head around the ramifications of this change\, and what it might imply elsewhere; i.e.\, what model we are following.

I don't see any reason for concern. This is fairly standard regexp backtracking control stuff which I see as fitting right into the (*VERB) framework.

Does any of that make sense?

So far not so much. But I'd like to see a more concrete example of what worries you before we dismiss it entirely. Its better to work through your concerns and be absolutely sure there isn't a problem here.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @iabyn

On Wed\, Feb 05\, 2014 at 10​:28​:12AM +0800\, demerphq wrote​:

Also I dont think there should be any problem with setting pos() inside of a match. I think that the regex engine should look at pos() only at the start of a match() and update pos() only at the end of the match.

Actually I struggle with deciding if this​:

$ perl -le'$s="abc"; my @​got= $s=~/(.)(?{ print pos($s) })/g;' 1 2 3

is a bug or not. It is obviously useful\, but implies that pos() is meaningful inside of a match which to me is far more disturbing than a regex verb changing pos at the end of a match. After all *every* match changes pos() at the end of a match. But IMO this is really an entirely separate discussion.

I feel twitchy about pos being changed mid-match in terms of how it might interact with \G (which is already in voodoo territory).

The pos($s) above is documented behaviour​:

  Inside a C\<(?{...})> block\, C\<$_> refers to the string the regular   expression is matching against. You can also use C\<pos()> to know what is   the current position of matching within this string.

-- The Enterprise is captured by a vastly superior alien intelligence which does not put them on trial.   -- Things That Never Happen in "Star Trek" #10

p5pRT commented 10 years ago

From @demerphq

On 5 February 2014 20​:19\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Feb 05\, 2014 at 10​:28​:12AM +0800\, demerphq wrote​:

Also I dont think there should be any problem with setting pos() inside of a match. I think that the regex engine should look at pos() only at the start of a match() and update pos() only at the end of the match.

Actually I struggle with deciding if this​:

$ perl -le'$s="abc"; my @​got= $s=~/(.)(?{ print pos($s) })/g;' 1 2 3

is a bug or not. It is obviously useful\, but implies that pos() is meaningful inside of a match which to me is far more disturbing than a regex verb changing pos at the end of a match. After all *every* match changes pos() at the end of a match. But IMO this is really an entirely separate discussion.

I feel twitchy about pos being changed mid-match in terms of how it might interact with \G (which is already in voodoo territory).

Well\, my thinking was it would be a "useless" modification. IOW\, the regex engine should inspect pos() *once*\, at the start of the match\, and updates to pos() should have no effect.

The pos($s) above is documented behaviour​:

Inside a C\<\(?\{\.\.\.\}\)> block\, C\<$\_> refers to the string the regular
expression is matching against\. You can also use C\<pos\(\)> to know what is
the current position of matching within this string\.

Ok. Thats fine. We also document that modifying pos() inside of a (?{...}) is ineffective​:

perldoc -f pos

  "pos" directly accesses the location used by the regexp engine to store the offset\, so assigning to "pos" will   change that offset\, and so will also influence the "\G" zero-width assertion in regular expressions. Both of these   effects take place for the next match\, so you can't affect the position with "pos" during the current match\, such   as in "(?{pos() = 5})" or "s//pos() = 5/e".

SO I guess that is that then :-)

Yves

p5pRT commented 10 years ago

From perl5-porters@perl.org

Yves Orton wrote​:

I think you need to come up with a better example of what worries you

I felt uncomfortable about the feature but did not know exactly why. Just ignore me.

p5pRT commented 10 years ago

From @demerphq

Just replying to get this into the ML record...

On 6 February 2014 08​:52\, endohiro \endohiro@&#8203;he\.catv\-yokohama\.ne\.jp wrote​:

Hellow Mr. Yves orton.

Thank you for hearing my talk .

Endohiro-san is asking us to add a new verb (*LAST) which will allow one to break out of a /g loop.

yes.

would it *also* act like (*ACCEPT)?

No.

$_= "1a2b3c"; @​test = $_ =~ m/(1|2(*LAST)|3)[a-c]/g; print "@​test\n"; This? # 1a 2b or this​: # 1a 2

I hope "1a 2b". (*LAST) is nearly (*COMMIT).

(*COMMIT) work only failed match. (*LAST) work successful match.

If failed match then (*LAST) work? Sorry. I don't know. I can't judge. Please think you. I will agree your idea.

while(...){ /(*LAST)/ } affect that loop?

No. I hope that (*LAST) is nearly (*COMMIT). (*LAST) is not "last;" of "while ( ... ){ last; }".

Thank you.

-------------------------- endohiro

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From endohiro@he.catv-yokohama.ne.jp

Hello\, everyone. I am sorry to be poor at English.

I think that Mr. Yves Orton understand my thinking perfectly about (*LAST). I am glad to see you. You are a great hacker. I am honorable to meet you.

(*LAST) is a confusing name for "last;" of while. Please change (*LAST) name. ( I'm sorry Mr. Father Chrysostomos. )

Thank you.


# (*LAST) stop /g loop in list-context.

$_ = "123"; @​test = $_ =~ m/1|2(*LAST)|3/g; print "@​test\n";

[result] 1 2


# (*LAST) don't stop a search to end of regexp. $_= "1a2b3c"; @​test = $_ =~ m/(1|2(*LAST)|3)[a-c]/g; print "@​test\n";

[result] 1a 2b


# while ( /g ) $html = "\

Apple\

\

Orange\

\

Strawberry\

";

while ( $html =~ m/ \

  (   Apple   |Orange (*LAST)   |Strawberry   ) \<\/p> /gx ){ print "$1\n"; }

[result] Apple Orange

# (*LAST) stop /g loop when successful match.


# (*LAST) do nothing when failed match. $html = "\

Apple\

\

Orange Juice\

\

Strawberry\

";

[result] Apple Strawberry

# If you want to stop /g loop in this case then you can use this regexp. |Orange (*LAST)(*COMMIT)

# "(*LAST)(*COMMIT)" always stop /g loop.


# while ( ... ) { m/(*LAST)/ } while ( 1 ) { m/(*LAST)/; }

# "while" loop is endless. # (*LAST) don't stop while loop .


# while ( ... ) { m/(*LAST)/ + /g } while ( 1 ) { m/(*LAST)/g; } # "while" loop is endless too.


Hiroyuki ENDO