Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 523 forks source link

magic open of ARGV #1566

Closed p5pRT closed 15 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#2783 (status was 'rejected')

Searchable as RT2783$

p5pRT commented 24 years ago

From thospel@mail.dma.be

In article \E12V7Jw\-0002PL\-00@​ursa\.cus\.cam\.ac\.uk\,   "M.J.T. Guy" \mjtg@​cus\.cam\.ac\.uk writes​:

No\, I'm *not* trying to restart this flame war. But it was a "security" issue\, and security seems to be in fashion at the moment\, and it *was* left in a somewhat unsatisfactory state.

THe story so far\, for the benefit of younger readers​: [ with the usual IIRC caveats - go to the archives if you want the real facts ] There's a booby trap when magic open (i.e. initial/final special characters like \< > |) is used in conjunction with \<>. Suppose some devious person has left around a file such as "| rm -rf *;". THen root's cron job comes along and does

       my\_scan\_command \*

and ... Boom! Here's a more innocent demonstration​:

$ cat >'| echo Bwahahahaha' hkgfjhgfhgf $ perl -wne '' * Bwahahahaha $

Note that the Perl script is obviously "so simple it can't have any security holes".

There were two proposals for fixing this​: a maximal one which would have banned all magic in association with \<>\, and a minimal one (championed by Tom C) which would have made the open non-magic iff a file of that name existed. So the minimal proposal is essentially backwards compatible\, and loses no functionality apart from active malice.

In fact\, there was a little known third proposal by yours truly (hi !)​: Turn of magic \<> if the perl command line contains an explicit -- Otherwise you are still hacked. Observe​:

mkdir /tmp/a cd /tmp/a echo > '-e;print("Bwahaha\n")' echo foo > bar perl -wne '' *

Will also give you the dreaded​: Bwahaha

So\, since a security aware person has to do

perl -wne '' -- *

anyways\, let that remove the magicness

p5pRT commented 18 years ago

From @smpeters

[thospel@​mail.dma.be - Tue Mar 28 03​:56​:10 2000]​:

In article \E12V7Jw\-0002PL\-00@&#8203;ursa\.cus\.cam\.ac\.uk\, "M.J.T. Guy" \mjtg@&#8203;cus\.cam\.ac\.uk writes​:

No\, I'm *not* trying to restart this flame war. But it was a "security" issue\, and security seems to be in fashion at the moment\, and it *was* left in a somewhat unsatisfactory state.

THe story so far\, for the benefit of younger readers​: [ with the usual IIRC caveats - go to the archives if you want the real facts ] There's a booby trap when magic open (i.e. initial/final special characters like \< > |) is used in conjunction with \<>. Suppose some devious person has left around a file such as "| rm -rf *;". THen root's cron job comes along and does

       my\_scan\_command \*

and ... Boom! Here's a more innocent demonstration​:

$ cat >'| echo Bwahahahaha' hkgfjhgfhgf $ perl -wne '' * Bwahahahaha $

Note that the Perl script is obviously "so simple it can't have any security holes".

There were two proposals for fixing this​: a maximal one which would have banned all magic in association with \<>\, and a minimal one (championed by Tom C) which would have made the open non-magic iff a file of that name existed. So the minimal proposal is essentially backwards compatible\, and loses no functionality apart from active malice.

In fact\, there was a little known third proposal by yours truly (hi !)​: Turn of magic \<> if the perl command line contains an explicit -- Otherwise you are still hacked. Observe​:

mkdir /tmp/a cd /tmp/a echo > '-e;print("Bwahaha\n")' echo foo > bar perl -wne '' *

Will also give you the dreaded​: Bwahaha

So\, since a security aware person has to do

perl -wne '' -- *

anyways\, let that remove the magicness

The flow Ton has just above seems to have been fixed.

steve@​kirk​:\~/perl-current$ mkdir /tmp/a steve@​kirk​:\~/perl-current$ cd /tmp/a steve@​kirk​:/tmp/a$ echo > '-e;print("Bwahaha\n")' steve@​kirk​:/tmp/a$ echo foo > bar steve@​kirk​:/tmp/a$ perl -wne '' * steve@​kirk​:/tmp/a$ ls -ltr total 8 -rw-r--r-- 1 steve steve 1 2005-09-27 23​:03 -e;print("Bwahaha\n") -rw-r--r-- 1 steve steve 4 2005-09-27 23​:03 bar

Although the original flow that started this ticket still exists.

p5pRT commented 18 years ago

From perl5-porters@ton.iguana.be

In article \rt\-3\.0\.11\-2783\-121717\.9\.08824524474802@&#8203;perl\.org\,   "Steve Peters via RT" \perlbug\-followup@&#8203;perl\.org writes​:

The flow Ton has just above seems to have been fixed.

steve@​kirk​:\~/perl-current$ mkdir /tmp/a steve@​kirk​:\~/perl-current$ cd /tmp/a steve@​kirk​:/tmp/a$ echo > '-e;print("Bwahaha\n")' steve@​kirk​:/tmp/a$ echo foo > bar steve@​kirk​:/tmp/a$ perl -wne '' * steve@​kirk​:/tmp/a$ ls -ltr total 8 -rw-r--r-- 1 steve steve 1 2005-09-27 23​:03 -e;print("Bwahaha\n") -rw-r--r-- 1 steve steve 4 2005-09-27 23​:03 bar

Although the original flow that started this ticket still exists.

Just downloaded and tried a bleadperl. Still works for me. Nor do I think it CAN be solved without the user doing something like adding the -- (well\, anouther way would be to not accept a second -e or even a first one after a non-option argument). It's the shell that expands the *\, so perl never sees anything different from

  perl -wne '' '-e;print("Bwahaha\n")'

which is *supposed* to work.

Maybe it's your shell that refuses to expand the file with an option in the name ?

p5pRT commented 18 years ago

From prev a.r.ferreira@gmail.com

On 9/28/05\, Ton Hospel \perl5\-porters@&#8203;ton\.iguana\.be wrote​:

In article \rt\-3\.0\.11\-2783\-121717\.9\.08824524474802@&#8203;perl\.org\, "Steve Peters via RT" \perlbug\-followup@&#8203;perl\.org writes​:

The flow Ton has just above seems to have been fixed.

steve@​kirk​:\~/perl-current$ mkdir /tmp/a steve@​kirk​:\~/perl-current$ cd /tmp/a steve@​kirk​:/tmp/a$ echo > '-e;print("Bwahaha\n")' steve@​kirk​:/tmp/a$ echo foo > bar steve@​kirk​:/tmp/a$ perl -wne '' * steve@​kirk​:/tmp/a$ ls -ltr total 8 -rw-r--r-- 1 steve steve 1 2005-09-27 23​:03 -e;print("Bwahaha\n") -rw-r--r-- 1 steve steve 4 2005-09-27 23​:03 bar

Although the original flow that started this ticket still exists.

Just downloaded and tried a bleadperl. Still works for me. Nor do I think it CAN be solved without the user doing something like adding the -- (well\, anouther way would be to not accept a second -e or even a first one after a non-option argument). It's the shell that expands the *\, so perl never sees anything different from

perl -wne '' '-e;print("Bwahaha\n")'

which is *supposed* to work.

Maybe it's your shell that refuses to expand the file with an option in the name ?

If I am not overlooking some point in the current discussion\, the "magic open of ARGV" is not different from the magic of C\\, which gives supports to the tricky/powerful pipes (including '| print("Bwahaha\n")' or '| rm -rf *;'). So that's not a bug\, but a feature. This is all documented in C\<perldoc -f open> and C\<perldoc perlopentut>. In the section "Dispelling the Dweomer" (perldoc perlopentut) we read

  If you want to use "\" processing in a totally boring and non-mag-   ical way\, you could do this first​:

  # "Sam sat on the ground and put his head in his hands.   # 'I wish I had never come here\, and I don't want to see   # no more magic\,' he said\, and fell silent."   for (@​ARGV) {   s#^([^./])#./$1#;   $_ .= "\0";   }   while (\<>) {   # now process $_   }

So if you don't trust your script users\, your code must be more robust than using a naked \<>. At least some preprocessing of @​ARGV applies before calling \<>. Talking like the Jarkko's histerical raisins\, too much code relies on the superpowers of C\ which can be used to process files prior to input with shell utilities (for example\, ' | gunzip dat.gz') and the like. This can't be changed globally​: but everyone is welcome to add code to make it safer in certain applications or even a reusable module which does this.

About the possibility to introduce an extra (potentially malicious) C\<-e> option via

perl -wne '' *

and files with weird name like '-e;print("Bwahaha\n")'\, this won't work calling perl with a script​:

perl script.pl *

would get the -e as an argument. So don't use one-liners with arguments like * if you think about security.

Maybe the documentation could include one or two phrases on the potential for security breaches with open and \<>\, maybe not.

p5pRT commented 18 years ago

From @tamias

On Thu\, Sep 29\, 2005 at 09​:02​:27AM -0300\, Adriano Ferreira wrote​:

About the possibility to introduce an extra (potentially malicious) C\<-e> option via

perl -wne '' *

and files with weird name like '-e;print("Bwahaha\n")'\, this won't work calling perl with a script​:

perl script.pl *

would get the -e as an argument. So don't use one-liners with arguments like * if you think about security.

Putting -- before the argument list should avoid that problem.

perl -wne '' -- *

Ronald

p5pRT commented 15 years ago

From @pjf

Raising this bug from the dead so we can lay it to rest at last.

The original bug report reads​:

There's a booby trap when magic open (i.e. initial/final special characters like \< > |) is used in conjunction with \<>. Suppose some devious person has left around a file such as "| rm -rf *;".

Yes\, \<> using 2-argument open just contain a nasty surprise. I don't like it either. However I believe it's considered a feature\, and I've certainly seen a few tutorials\, as well as working code that delights in the ability to write​:

  myprog.pl log.0 log.1 'gunzip -c log.2.gz |'

and have \<> work its magic.

This means I don't think we'll see \<> changing to using 3-argument open any time soon. Even if it did\, all the existing code out there using older Perls would still be vulnerable _anyway_\, as well as the potential for some existing code that uses this "feature" to break when Perl is upgraded.

Luckily\, there's a reasonably good work-around\, and that's to use taint mode. Because command-line arguments are always tainted\, but Perl doesn't check for taint when opening a file for *reading* (but it does for writing and for pipes)\, starting Perl in taint mode practically eliminates the problem of code injection attacks via command-line arguments and \<>.

If the program didn't intend to execute external commands to begin with\, then there should be no changes when the program uses taint. If it *did* intend to execute external commands\, but we're in an environment where the filesystem itself may be considered hostile\, then we definitely want to be using taint anyway. ;)

One can still potentially use the arcane invocation '\<&=0' to dup STDIN (or another filehandle) without taint checks\, but that's much less serious than executing arbitrary code.

As such\, I'm resolving this ticket and marking it as not-a-bug.

Cheerio\,

  Paul

-- Paul Fenwick \pjf@&#8203;perltraining\.com\.au | http​://perltraining.com.au/ Director of Training | Ph​: +61 3 9354 6001 Perl Training Australia | Fax​: +61 3 9354 2681

p5pRT commented 15 years ago

@pjf - Status changed from 'open' to 'rejected'

p5pRT commented 15 years ago

From @jbenjore

On Wed\, Jul 2\, 2008 at 12​:33 AM\, Paul Fenwick via RT \perlbug\-followup@&#8203;perl\.org wrote​:

...

This means I don't think we'll see \<> changing to using 3-argument open any time soon. Even if it did\, all the existing code out there using older Perls would still be vulnerable _anyway_\, as well as the potential for some existing code that uses this "feature" to break when Perl is upgraded.

I'm of the opinion that working code should break in new versions of perl if it is hitting this and people should thank us for it. Anyone desiring to not have this break should continue to use old perl. Any tutorial teaching this has always been broken. This has never been a good feature and just because some people use it doesn't contradict that.

I find it incredibly aggravating that my one-liners do the wrong thing when they attempt to read my files \<.st and >.st. Or rather - that I must be careful to never let any one-liners touch some files.

As such\, I'm resolving this ticket and marking it as not-a-bug.

No.

Cheerio\,

No.

Josh

p5pRT commented 15 years ago

From @ysth

On Mon\, July 14\, 2008 3​:55 pm\, Joshua ben Jore wrote​:

On Wed\, Jul 2\, 2008 at 12​:33 AM\, Paul Fenwick via RT wrote​:

This means I don't think we'll see \<> changing to using 3-argument open any time soon. Even if it did\, all the existing code out there using older Perls would still be vulnerable _anyway_\, as well as the potential for some existing code that uses this "feature" to break when Perl is upgraded.

I'm of the opinion that working code should break in new versions of perl if it is hitting this and people should thank us for it. Anyone desiring to not have this break should continue to use old perl.

No.

Any tutorial teaching this has always been broken. This has never been a good feature and just because some people use it doesn't contradict that.

I'd be fine with breaking the feature in 5.12 by default\, but would like a pragma to re-enable it.

Note that changing from 2-arg to 3-arg open breaks the following idiom (command line args are files of filenames to read)​:

  @​ARGV = \<>;   while (\<>) { ... }

because trailing newlines on the filenames are no longer ignored.

p5pRT commented 15 years ago

From @ap

* Yitzchak Scott-Thoennes \sthoenna@&#8203;efn\.org [2008-07-15 03​:10]​:

Note that changing from 2-arg to 3-arg open breaks the following idiom (command line args are files of filenames to read)​:

@​ARGV = \<>; while (\<>) { ... }

because trailing newlines on the filenames are no longer ignored.

Writing

  chomp(@​ARGV = \<>);

is no great hardship.

(Note that at this time I am not taking either side in the greater debate; the above is not an argument either way.)

-- *AUTOLOAD=*_;sub _{s/(.*)​::(.*)/print$2\,("\,$\/"\," ")[defined wantarray]/e;$1} &Just->another->Perl->hack; #Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 15 years ago

From ben@morrow.me.uk

Quoth twists@​gmail.com ("Joshua ben Jore")​:

On Wed\, Jul 2\, 2008 at 12​:33 AM\, Paul Fenwick via RT \perlbug\-followup@&#8203;perl\.org wrote​:

...

This means I don't think we'll see \<> changing to using 3-argument open any time soon. Even if it did\, all the existing code out there using older Perls would still be vulnerable _anyway_\, as well as the potential for some existing code that uses this "feature" to break when Perl is upgraded.

I'm of the opinion that working code should break in new versions of perl if it is hitting this and people should thank us for it. Anyone desiring to not have this break should continue to use old perl. Any tutorial teaching this has always been broken. This has never been a good feature and just because some people use it doesn't contradict that.

I find it incredibly aggravating that my one-liners do the wrong thing when they attempt to read my files \<.st and >.st. Or rather - that I must be careful to never let any one-liners touch some files.

How about making the implicit open done by \<> use either main​::open (if defined) or CORE​::GLOBAL​::open (if defined)\, so that it's possible to write a SafeOpen.pm that overrides one of these to map  
  open my $FH\, '\<foo';

to

  open my $FH\, '\<'\, '\<foo';

?

Ben

-- Like all men in Babylon I have been a proconsul; like all\, a slave ... During one lunar year\, I have been declared invisible; I shrieked and was not heard\, I stole my bread and was not decapitated. ~ ben@​morrow.me.uk ~ Jorge Luis Borges\, 'The Babylon Lottery'

p5pRT commented 15 years ago

From @jbenjore

On Tue\, Jul 15\, 2008 at 6​:10 AM\, Ben Morrow \ben@&#8203;morrow\.me\.uk wrote​:

How about making the implicit open done by \<> use either main​::open (if defined) or CORE​::GLOBAL​::open (if defined)\, so that it's possible to write a SafeOpen.pm that overrides one of these to map

Ok\, but name it UnsafeOpen.pm because the default should work properly. That is\, 5.12 should out of the box not do anything weird when given a file with any of the \<\, >\, or | characters anywhere in it. It should just read it. I'm ok with writing 5.10 off. I didn't want to but if that's what it takes to get this change\, ok.

Josh

p5pRT commented 15 years ago

From @epa

Joshua ben Jore \<twists \ gmail.com> writes​:

How about making the implicit open done by \<> use either main​::open (if defined) or CORE​::GLOBAL​::open (if defined)\, so that it's possible to write a SafeOpen.pm that overrides one of these to map

Ok\, but name it UnsafeOpen.pm because the default should work properly. That is\, 5.12 should out of the box not do anything weird when given a file with any of the \<\, >\, or | characters anywhere in it. It should just read it.

FWIW\, I completely agree with this. In my opinion it is much\, much too dangerous to have a common construct - one which is taught to beginners in every Perl tutorial and looks innocuous - be tripped up so easily as by a file called '|x' or anything else containing magic characters.

Yes\, taint mode does prevent this\, but unless taint mode is on by default for 5.12 it doesn't address the problem. The simple\, default code should be 100% safe.

Perl's motto is that easy things should be easy​: surely reading some files specified on the command line\, without barfing or worse on special characters\, is one of those easy things. Hard things should be possible\, and magical open() is certainly a useful feature in some situations\, but magic can be dangerous. By all means have it if you ask for it but the default\, simplest code must be safe for all situations.

Please could I ask the perl5 core team to have another look at this bug report.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From tchrist@perl.com

I'm of the opinion that working code should break in new versions of perl if it is hitting this and people should thank us for it. Anyone desiring to not have this break should continue to use old perl. Any tutorial teaching this has always been broken. This has never been a good feature and just because some people use it doesn't contradict that.

It's not obvious where in your text you've made the transition from mere opinion to trying to make people believe that such is fact; but that you've sneakily done so\, there's no doubt.

I find it incredibly aggravating that my one-liners do the wrong thing when they attempt to read my files \<.st and >.st. Or rather - that I must be careful to never let any one-liners touch some files.

People who create files with aggravating names are aggravating people; just say no. Don't you remember

  chdir "/tmp";   mkdir "etc\n";   chdir("/etc\n");   (naughty games with passwd I shan't detail)

Then there's the protection measure of "touch ./-i"; think about it.

If it takes a syswide find to locate unpleasant filenames and send whingeing or abusive or threatening mail (interpretation rests on the receiver) about the inadvisability of files with whitespace or brackets\, or question-marks or stars\, (and for perl\, also minuses\, ampersands\, equals\, or pipes)\, then that's a local administration issue. Look to the plank in thine own eye...

And the day that I can no longer rely upon the overlying system to automatically understand that "-" is STDIN for input or STDOUT for output is the day that I will fork a parallel copy of Perl that maintains traditional and expected behavior. However\, I trust that will never need to occur\, ofr no pumpking has ever been so blindly cavalier--which can probably be read as "foolish" if you're of that bent.

What you don't seem to understand is that taking a homogeneous approach to argument processing is not a bug\, but a feature--a BIG FEATURE. Perhaps you're too much of a youngster to remember the days that shells didn't glob command-line arguments for you\, and each program had to parse and process its own command line string. But I am not. Those were dark days of unpredictability. It's the wrong way to (not) go about things.

When you make every program encode the logic of treating "-" as special\, you are making a very big mistake. Some programs shall\, others shan't. And therein lies the PITA.

Those ignorant of Unix are doomed to reinvent it--poorly.

Don't go there.

*DON'T!*

You would ask this sort of idiocy​:

  % perl -le 'open(IT\, ">"\, "-")||die; print IT "stuff"; close(IT)||die'   % cat ./-   stuff

Just say no. I could put it more strongly\, but if you haven't figured out the serious flaw in your "thinking" by now\, strong talk won't help that.

As such\, I'm resolving this ticket and marking it as not-a-bug.

No.

Not merely wrong\, but exceptionally so.

--tom

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Tom Christiansen wrote​:

[ in response to a no doubt interesting thread ... ] And the day that I can no longer rely upon the overlying system to automatically understand that "-" is STDIN for input or STDOUT for output is the day that I will fork a parallel copy of Perl that maintains traditional and expected behavior. However\, I trust that will never need to occur\, ofr no pumpking has ever been so blindly cavalier--which can probably be read as "foolish" if you're of that bent.

What you don't seem to understand is that taking a homogeneous approach to argument processing is not a bug\, but a feature--a BIG FEATURE. Perhaps you're too much of a youngster to remember the days that shells didn't glob command-line arguments for you\, and each program had to parse and process its own command line string. But I am not. Those were dark days of unpredictability. It's the wrong way to (not) go about things.

That open(handle\, arg) doesn't translate 1​:1 as a system call open() request *is* a feature - a feature that could provide more value than it does today.

I often find myself wishing Perl accepted many MORE magical open syntaxes than present. I think the following replacement for simple 'wget -o-' would be very cool​:

  perl -pe 1 http​://www.cpan.org/index.html

For applications with security concerns\, the argument to open() still needs to be checked whether 2-arg or 3-arg. I don't see the logic behind an argument that would suggest the existing behaviour should be changed. If a person doesn't like 2-arg - don't use it?

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From @arc

Tom Christiansen writes​:

taking a homogeneous approach to argument processing is not a bug\, but a feature--a BIG FEATURE. Perhaps you're too much of a youngster to remember the days that shells didn't glob command-line arguments for you\, and each program had to parse and process its own command line string. But I am not. Those were dark days of unpredictability. It's the wrong way to (not) go about things.

When you make every program encode the logic of treating "-" as special\, you are making a very big mistake. Some programs shall\, others shan't.

And yet\, in the context of Unix as a whole\, every program _already does_ have to treat "-" as special. In 7th Edition\, cat\, cmp\, comm\, diff\, join\, sort\, and split all handle an argument of "-" to mean standard input\, while egrep\, fgrep\, grep\, od\, sum\, tail\, and uniq don't. Perl programs using two-argument C\ handle "-". Those using three-argument C\\, or written in other languages\, don't\, unless they include specific code to accomplish that. Furthermore\, while this situation is clearly suboptimal\, it doesn't seem to be intolerable​: we might be annoyed when program behaviour is at odds with our expectations\, but we can work around it.

My strong suspicion is that a large part of what to consider correct behaviour in this area comes down to personal preference. I always want my command-line programs to treat "-" as a reference to standard input\, but I never want leading or trailing pointies or pipes to mean anything other than the files with the names as given. You clearly differ on that\, and I don't think your preferences are invalid.

I'm certainly not suggesting that Perl's long-standing behaviour here should be changed\, but I don't think the situation is as simple as is implied by declaring that "taking a homogeneous approach to argument processing is [...] a feature".

I also note that some of the most interesting uses of magical two-argument C\ -- things like

  perl -lne '...' 'command1 | filter1 |' 'command2 | filter2 |'

-- can in modern shells be handled with process substitution​:

  perl -lne '...' \<(command1 | filter1) \<(command2 | filter2)

I'm not convinced that that really constitutes an argument one way or the other\, though.

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 15 years ago

From zefram@fysh.org

Aaron Crane wrote​:

                                                      I always

want my command-line programs to treat "-" as a reference to standard input\,

I think this convention is obsolete. If you want to refer to standard input in a filename context\, you can use /dev/stdin. It's a true filename\, and is available to all programs without any of them having to do anything special.

-zefram

p5pRT commented 15 years ago

From @arc

Zefram writes​:

Aaron Crane wrote​:

"-" as a reference to standard input

I think this convention is obsolete. If you want to refer to standard input in a filename context\, you can use /dev/stdin. It's a true filename\, and is available to all programs without any of them having to do anything special.

True\, but it's a lot more work to type.

Also\, it doesn't behave quite the same way on all operating systems.

  $ uname   Linux   $ head -1 /usr/share/dict/words

  $ tail -c +52 /usr/share/dict/words | head -1   Aaron   $ cat stdin.pl   seek STDIN\, 51\, 0 or die "seek​: $!\n";   open my $fh\, $ARGV[0] or die "open​: $!\n";   print tell $fh\, "\n";   print scalar \<$fh>;   $ perl -w stdin.pl /dev/stdin \< /usr/share/dict/words   0

  $ perl -w stdin.pl - \< /usr/share/dict/words   51   Aaron

On some OSes (including BSD)\, opening /dev/stdin is equivalent to calling dup(2) on fd 0\, so that (for seekable file descriptors) the new file descriptor shares a file offset pointer with stdin.

Linux differs; opening /dev/stdin gives you a file descriptor open on the same file as fd 0\, but the two descriptors are distinct\, and do not share a file offset pointer. (And since /dev/stdin is actually a symlink to /proc/self/fd/0\, an equivalent statement applies to opening anything under /proc/*/fd.)

As it happens\, Perl's magical open of "-" is neither dup(0) nor open("/dev/stdin"\, O_RDONLY). Instead\, it gives you a new stdio-ish filehandle open on file descriptor 0; that's why the behaviour under Linux differs from opening /dev/stdin.

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 15 years ago

From @epa

Tom Christiansen \<tchrist \ perl.com> writes​:

I find it incredibly aggravating that my one-liners do the wrong thing when they attempt to read my files \<.st and >.st.

People who create files with aggravating names are aggravating people; just say no.

If it takes a syswide find to locate unpleasant filenames and send whingeing or abusive or threatening mail (interpretation rests on the receiver) about the inadvisability of files with whitespace or brackets\, or question-marks or stars\, (and for perl\, also minuses\, ampersands\, equals\, or pipes)\, then that's a local administration issue.

You seem to have in mind the comfortable Unix environment most of us remember from university days (or elsewhere)\, when your local bearded sysadmin would keep a watchful eye on the students and make sure that the local collection of eclectic\, flaky but nonetheless lovable administrative shell scripts kept working smoothly. I am not saying that is a bad ideal at all. But it's not the issue here.

You can't run a system-wide check for 'unpleasant' filenames before every invocation of your perl program. And although someone deliberately making a file '| rm -rf /' is the example we all use (and a good enough example in its own right to make this worth fixing\, IMHO)\, in practice a successful attack to get control of a computer often uses several stages. The first stage might be to trick a slightly dopey CGI script into making a filename containing a '|' character in its temporary directory - not a security hole in itself\, right? and then wait for an administrative Perl script run by root to 'while (\<>)' in that directory. Or indeed the same CGI script might use the 'while (\<>)' construct - which is surely just reading some files and should be safe\, right?

Remember that here we're not talking about some obscure construct that only experts know\, where people can be expected to read about the security risks and gotchas before using it. This is pretty much in the first chapter of every beginner's Perl tutorial - yet few of them have any warning that it might do unexpected things if a file in the current directory is 'unpleasant'. Surely we have a duty to make sure that the code recommended to novices is totally safe to use.

But security holes are just one specific class of bugs. In principle\, the issue is that asking perl to read some files should work 100% of the time. Not some of the time depending on what filenames exist.

And the day that I can no longer rely upon the overlying system to automatically understand that "-" is STDIN for input or STDOUT for output is the day that I will fork a parallel copy of Perl that maintains traditional and expected behavior.

If I might go rhetorical for a moment​: to rely upon the overlying system? That seems like a very good idea. So the C library itself should interpret '-' to mean stdin or stdout\, it shouldn't be implemented afresh in every application like perl\, grep\, tar and so on.

So why doesn't the C library automatically treat '-' as stdin or stdout?

What you don't seem to understand is that taking a homogeneous approach to argument processing is not a bug\, but a feature--a BIG FEATURE. Perhaps you're too much of a youngster to remember the days that shells didn't glob command-line arguments for you\, and each program had to parse and process its own command line string. But I am not. Those were dark days of unpredictability. It's the wrong way to (not) go about things.

Absolutely. It would be crazy for perl or any other program to start doing argument processing that better belongs in the shell. Imagine if shells didn't support commands like

% cat \<(echo hello; echo goodbye)

and cat\, and every other program\, had to have special code to recognize '\<' in its arguments. It would be a terrible mess - and you'd need another layer of mess to handle the case when '\<' really is in a filename. Much better to let the shell do it.

When you make every program encode the logic of treating "-" as special\, you are making a very big mistake. Some programs shall\, others shan't.

You would ask this sort of idiocy​:

% perl -le 'open(IT\, ">"\, "-")||die; print IT "stuff"; close(IT)||die' % cat ./- stuff

That's exactly the behaviour I see with perl 5.10. What result do you get?

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From @epa

Mark Mielke \<mark \ mark.mielke.cc> writes​:

For applications with security concerns\, the argument to open() still needs to be checked whether 2-arg or 3-arg. I don't see the logic behind an argument that would suggest the existing behaviour should be changed. If a person doesn't like 2-arg - don't use it?

The issue here is that

  while (\<>) { ... }

is using the magical 2-argument open\, with all the implications of running external commands depending on what filenames happen to be in the current directory\, yet this code is taught to everyone as the standard way to open the files in the command line.

The un-magic alternative requires a lot more code. Surely this is backwards​: the simple short code should be safe to use under all circumstances\, and if you want the more dangerous (though certainly very useful) behaviour you should have to ask for it.

As I see it\, then\, the choices are

- Change while (\<>) and while (\) to use 3-argument open\, perhaps with an exception for '-' to mean stdin\, since reading from stdin is not normally dangerous.

- Or change every Perl tutorial starting with Perl's own documentation to note that while (\<>) can do funny things and is not to be used unless you trust everyone and every program that could have created a file in the current directory.

- Or introduce a new language construct 'the safe way to read command line arguments' and change all the tutorials to recommend that instead.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From @abigail

On Fri\, Jul 25\, 2008 at 03​:13​:02PM +0000\, Ed Avis wrote​:

Mark Mielke \<mark \ mark.mielke.cc> writes​:

For applications with security concerns\, the argument to open() still needs to be checked whether 2-arg or 3-arg. I don't see the logic behind an argument that would suggest the existing behaviour should be changed. If a person doesn't like 2-arg - don't use it?

The issue here is that

while \(\<>\) \{ \.\.\. \}

is using the magical 2-argument open\, with all the implications of running external commands depending on what filenames happen to be in the current directory\, yet this code is taught to everyone as the standard way to open the files in the command line.

The un-magic alternative requires a lot more code. Surely this is backwards​: the simple short code should be safe to use under all circumstances\, and if you want the more dangerous (though certainly very useful) behaviour you should have to ask for it.

As I see it\, then\, the choices are

- Change while (\<>) and while (\) to use 3-argument open\, perhaps with an exception for '-' to mean stdin\, since reading from stdin is not normally dangerous.

- Or change every Perl tutorial starting with Perl's own documentation to note that while (\<>) can do funny things and is not to be used unless you trust everyone and every program that could have created a file in the current directory.

What does the current directory have to do with while (\<>)?

while (\<>) reads from filenames from @​ARGV\, not from the current directory.

- Or introduce a new language construct 'the safe way to read command line arguments' and change all the tutorials to recommend that instead.

- Teach people not to use "funny" characters in the filenames lightly\,   since whatever one may or may not do to Perl\, it *will* bite them   if they treat them carelessly; after all\, the set of of characters   that are special to Perl are (with the exception of -) a subset of   the characters that are special to most shells anyway.

Abigail

p5pRT commented 15 years ago

From @abigail

On Fri\, Jul 25\, 2008 at 03​:05​:30PM +0000\, Ed Avis wrote​:

Tom Christiansen \<tchrist \ perl.com> writes​:

I find it incredibly aggravating that my one-liners do the wrong thing when they attempt to read my files \<.st and >.st.

People who create files with aggravating names are aggravating people; just say no.

If it takes a syswide find to locate unpleasant filenames and send whingeing or abusive or threatening mail (interpretation rests on the receiver) about the inadvisability of files with whitespace or brackets\, or question-marks or stars\, (and for perl\, also minuses\, ampersands\, equals\, or pipes)\, then that's a local administration issue.

You seem to have in mind the comfortable Unix environment most of us remember from university days (or elsewhere)\, when your local bearded sysadmin would ke a watchful eye on the students and make sure that the local collection of eclectic\, flaky but nonetheless lovable administrative shell scripts kept working smoothly. I am not saying that is a bad ideal at all. But it's not t issue here.

You can't run a system-wide check for 'unpleasant' filenames before every invocation of your perl program. And although someone deliberately making a file '| rm -rf /' is the example we all use (and a good enough example in its own right to make this worth fixing\, IMHO)\, in practice a successful attack to get control of a computer often uses several stages. The first stage might be to trick a slightly dopey CGI script into making a filename containing a '|' character in its temporary directory - not a security hole in itself\, right? a then wait for an administrative Perl script run by root to 'while (\<>)' in tha directory. Or indeed the same CGI script might use the 'while (\<>)' construct which is surely just reading some files and should be safe\, right?

Oh\, come on. This is a solved problem. The answer is -T​:

  $ perl -wTE 'while (\<>) {print}' '> foo'   Insecure dependency in open while running with -T switch at -e line 1.   $

-T is recommended both for CGI programs\, and programs running as root.

Abigail

p5pRT commented 15 years ago

From @abigail

On Fri\, Jul 25\, 2008 at 04​:36​:56PM +0100\, Ed Avis wrote​:

Abigail asked​:

What does the current directory have to do with while (\<>)?

Sorry\, I was thinking of the common usage of

% my_program *

- Teach people not to use "funny" characters in the filenames lightly\, since whatever one may or may not do to Perl\, it *will* bite them if they treat them carelessly; after all\, the set of of characters that are special to Perl are (with the exception of -) a subset of the characters that are special to most shells anyway.

I agree that using funny characters deliberately is not a good idea. However\,

If you are running "my_program *" as root in a directory where black hat people could have created files\, you have a problem anyway.

For instance\, 'rm -i *' won't ask for each file whether it should be deleted or not if someone creates a file '-f' in said directory.

The answer here is to teach people to not blindly assume their insecure environment is save. And Perl can help you there​: it's called -T. Which not only prevents the "while (\<>) {}" problem\, but a host of other problems as well.

Abigail

p5pRT commented 15 years ago

From @epa

Oh\, come on. This is a solved problem. The answer is -T​:

$ perl -wTE 'while (\<>) {print}' '> foo' Insecure dependency in open while running with -T switch at -e line 1.

That deals with a lot of it\, but the -T flag is not the default. By default if you write the innocuous-looking perl program

  use warnings;   use strict;   use 5.010;   while (\<>) { print }

you will get a program with all the magical\, dangerous behaviour discussed earlier in this thread. It would be better if safety were the default\, with some special command line flag or 'use' to turn on the unsafe behaviour.

To get some real-world data​: if you pick the first page of results from \<http​://www.google.com/codesearch?as_q=while+\(%3C%3E&btnG=Search+Code&hl=en&as_lang=perl&as_license_restrict=i&as_license=&as_package=&as_filename=&as_case=>\, which should be close to a random sample\, you see that none of these programs uses the -T flag\, nor does any of them document that it will do odd things if passed a filename that contains funny characters\, and so is unsafe (in the general case) to use with shell wildcards. And why should they? They were just writing the normal perl code to read some files specified on the command line.

I suppose that 'use warnings' could print out a warning on 'while (\<>)' saying 'this construct is not safe unless you use -T'\, but that seems daft to me. Just make it safe to use\, no ifs and no buts.

Even with -T\, the program will abort when the '>foo' filename is found\, which is not great for something that's purportedly meant to just read some files.

The issue is not that expert programmers should be able to turn on some particular flag or use some particular incantation to read files given on the command line in a safe way. The issue is that the simple\, small\, innocuous-looking code is dangerous. IMHO\, the simple code 'while (\<>)' should be safe for all uses\, and the flags\, bells and whistles can be added to turn on the magical\, risky behaviour if wanted.

-- Ed Avis \eda@&#8203;waniasset\.com

______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http​://www.messagelabs.com/email ______________________________________________________________________

p5pRT commented 15 years ago

From @epa

Abigail asked​:

What does the current directory have to do with while (\<>)?

Sorry\, I was thinking of the common usage of

% my_program *

- Teach people not to use "funny" characters in the filenames lightly\, since whatever one may or may not do to Perl\, it *will* bite them if they treat them carelessly; after all\, the set of of characters that are special to Perl are (with the exception of -) a subset of the characters that are special to most shells anyway.

I agree that using funny characters deliberately is not a good idea. However\, they do sometimes appear by accident\, and they can be made to appear by anyone who has write access to the directory where 'my_program *' is run. If my_program uses 'while (\<>)'\, then if 'my_program *' is run by root in a certain directory\, you are effectively granting root command access to anyone who can create a file in that directory.

Even in less drastic cases than the above (which is the worst case\, but certainly not impossible) a similar bug can be used as part of a multi-step exploit; perhaps the web server has a bug that means an attacker can cause it to write a zero-length file in a certain logfile directory; then a log grepper script might get caught. Again this is just an example.

Even if there are no security implications because only one person uses the computer and there are no external-facing daemons to be compromised\, it is still a peculiar behaviour for the program to go off and try to run the command 'x' when the filename '>x' was given on its command line. No standard Unix utility does this​:

% grep hello '>x' grep​: >x​: No such file or directory

although many standard Unix programs take an argument of '-' to mean read from standard input\, which is not usually dangerous.

-- Ed Avis \eda@&#8203;waniasset\.com

______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http​://www.messagelabs.com/email ______________________________________________________________________

p5pRT commented 15 years ago

From @epa

Abigail wrote​:

If you are running "my_program *" as root in a directory where black hat people could have created files\, you have a problem anyway.

For instance\, 'rm -i *' won't ask for each file whether it should be deleted or not if someone creates a file '-f' in said directory.

That is true. But I wouldn't expect 'grep *' to go off running random external programs. Not even if grep were implemented in perl.

The answer here is to teach people to not blindly assume their insecure environment is save.

That is a good thing to teach\, and the first lesson would be 'do not use while (\<>) unless you also use -T'. However that is not mentioned in any perl tutorial I know of.

If you have a tool which is potentially dangerous\, one alternative to giving this kind of warning is to provide beginners with a safer (if somewhat blunter) tool for everyday use. They can graduate to the more dangerous one when they are ready for it\, and understand the risks.

That's why I think that 'while (\<>)' would better be the safe kind of 'read all the files'\, and the more dangerous kind should have a syntax that marks it as such and cautions you not to blindly assume it will be safe.

-- Ed Avis \eda@&#8203;waniasset\.com

______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http​://www.messagelabs.com/email ______________________________________________________________________

p5pRT commented 15 years ago

From @davidnicol

On Fri\, Jul 25\, 2008 at 11​:14 AM\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

That's why I think that 'while (\<>)' would better be the safe kind of 'read all the files'\, and the more dangerous kind should have a syntax that marks it as such and cautions you not to blindly assume it will be safe.

perhaps\, 'while(\<\<>>)' could be the current 2-arg semantics from 5.11 on\, and while(\<>) would do three-arg opens?

-- Cheer up\, sad person

p5pRT commented 15 years ago

From @abigail

On Fri\, Jul 25\, 2008 at 02​:39​:16PM -0500\, David Nicol wrote​:

On Fri\, Jul 25\, 2008 at 11​:14 AM\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

That's why I think that 'while (\<>)' would better be the safe kind of 'read all the files'\, and the more dangerous kind should have a syntax that marks it as such and cautions you not to blindly assume it will be safe.

perhaps\, 'while(\<\<>>)' could be the current 2-arg semantics from 5.11 on\, and while(\<>) would do three-arg opens?

So\, you are willing to break programs that currently use the fact \<> is 2-arg open and work as intended in order that some dimwit that isn't using -T doesn't run into problems this morning\, but this afternoon?

The problem here Ed is painting here isn't 2-arg open; it's people not considering file names may have characters that are special. And if they won't get into trouble by \<>\, they'll get into problems by the shell. Or some other program. Or because they use 2-arg open in their programs.

(You know\, not all Perl tutorials rewrote themselves the moment 3-arg open became available. Nor did all Perl programs. Perhaps we should make it that using 2-arg is a compile time error. Of course\, the people that Ed is going to save won't be saved until they upgrade their perl.)

Now\, the idea of having both '\<>' and '\<\<>>'\, and have one of them do 2-arg open\, and the other 3-arg open is interesting. But I'd prefer not to break existing programs\, and would rather see 'while (\<\<>>)' do 3-arg open\, while leaving while (\<>) as is.

Abigail

p5pRT commented 15 years ago

From perl@nevcal.com

On approximately 7/25/2008 12​:53 PM\, came the following characters from the keyboard of Abigail​:

On Fri\, Jul 25\, 2008 at 02​:39​:16PM -0500\, David Nicol wrote​:

On Fri\, Jul 25\, 2008 at 11​:14 AM\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

That's why I think that 'while (\<>)' would better be the safe kind of 'read all the files'\, and the more dangerous kind should have a syntax that marks it as such and cautions you not to blindly assume it will be safe. perhaps\, 'while(\<\<>>)' could be the current 2-arg semantics from 5.11 on\, and while(\<>) would do three-arg opens?

So\, you are willing to break programs that currently use the fact \<> is 2-arg open and work as intended in order that some dimwit that isn't using -T doesn't run into problems this morning\, but this afternoon?

One could speculate about "while(\<>)" using 2-arg open if -T is set and 3-arg open otherwise\, with "while(\<\<>>)" or "use magical while;" causing 2-arg open to be used even without -T.

But to answer your question\, yes.

How else can we encourage the dimwits to continue using perl\, after they get burned by stuff like this\, if we don't improve the language?

So open docs do say​:

One should conscientiously choose between the magic and 3\-arguments form of open\(\)&#8203;:

    open IN\, $ARGV\[0\];

will allow the user to specify an argument of the form "rsh cat file |"\, but will not work on a filename which happens to have a trailing space\, while

    open IN\, '\<'\, $ARGV\[0\];

will have exactly the opposite restrictions\.

But there is no warning of all that under control flow statements in perlsyn where while(\<>) is discussed\, nor in perlintro where while(\<>) is discussed.

I've long since learned not to use certain characters in filenames\, and especially not at the beginning.

I\, myself\, have had to agree to use Python for a current project\, because Perl doesn't seem to have a Unicode-supporting\, cross-platform\, cross-platform-printing-capable GUI environment (the last point being the sticker). That's bad enough\, having quite esoteric requirements\, but for simple things to have such complex gotchas buried within is really bad.

-- Glenn -- http​://nevcal.com/

A protocol is complete when there is nothing left to remove. -- Stuart Cheshire\, Apple Computer\, regarding Zero Configuration Networking

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Glenn Linderman wrote​:

One could speculate about "while(\<>)" using 2-arg open if -T is set and 3-arg open otherwise\, with "while(\<\<>>)" or "use magical while;" causing 2-arg open to be used even without -T.

How else can we encourage the dimwits to continue using perl\, after they get burned by stuff like this\, if we don't improve the language?

Responding to Glenn's although not specifically to him\, but to all with this opinion​:

How many people have been "burned" by this "problem"? Is the new class of dimwits exceedingly more dimwitted than the previous? Why was this not a problem 10 years ago?

I don't see the point. There is no value in restricting Perl's functionailty\, so that some theoretical dimwit will have one less theoretical security hole in one theoretical scenario. Where is the proof that this "security problem" is causing problems\, and why aren't these dimwits having their hands cut off to prevent them from programming?

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From @epa

Abigail \<abigail \ abigail.be> writes​:

perhaps\, 'while(\<\<>>)' could be the current 2-arg semantics from 5.11 on\, and while(\<>) would do three-arg opens?

So\, you are willing to break programs that currently use the fact \<> is 2-arg open and work as intended

How many such programs are there? Is the behaviour really intended?

Have a look at the first few hits on Google Code Search​: not one of them mentions in its documentation that if passed a filename beginning with > it will overwrite some file rather than reading the file given. Yet surely this is just the sort of thing you would take care to mention in the documentation if you knew of it. This leads me to suspect that for the vast majority of cases\, the magical behaviour is not what the author intended.

To answer your question\, yes\, I do suggest breaking programs which were designed and documented to have magical behaviour depending on the filenames given (and which\, at the moment\, are unable to handle files that really do begin with > unless you resort to special tricks). I just don't think there are many of them.

in order that some dimwit

I could agree with your characterization of 'dimwit' if this were some obscure feature of the language\, marked clearly in the documentation as dangerous and for experts only. But here we are talking about while (\<>)\, supposedly the simplest way to do line-based processing in perl.

If someone is a dimwit for naively using while (\<>) and accidentally writing a program that is tripped up by special characters in filenames\, then I am a dimwit and so are thousands of other perl developers (again\, see Google Code Search).

Perl isn't a language for the elite. Easy things should be easy. Reading some files given on the command line - without introducing strange bugs or security holes depending on the characters contained in the filename - is one of those easy things.

The problem here Ed is painting here isn't 2-arg open; it's people not considering file names may have characters that are special. And if they won't get into trouble by \<>\, they'll get into problems by the shell.

Not so much​: modern shells handle cases like 'ls *' or 'for i in *' without trouble. I suppose it should be 'ls -- *'. Gotchas like this are one of the advantages perl claims over shell scripting​: that it is safer because it relies less on string interpolation.

Or some other program.

I don't think any other program not written in perl or shell script is likely to have the same problem. I would be interested to know of examples.

Or because they use 2-arg open in their programs.

With 2-arg open you can see that string interpolation is happening. To me\, that sets off alarm bells so I know to be careful and validate the filename first (in the rare cases where I still use 2-arg open). Someone further up the dimwit scale than me might use 2-arg without thinking\, but at least tutorial material has been updated to recommend the safe form and the safe form is not a lot more difficult to type.

Now\, the idea of having both '\<>' and '\<\<>>'\, and have one of them do 2-arg open\, and the other 3-arg open is interesting. But I'd prefer not to break existing programs\, and would rather see 'while (\<\<>>)' do 3-arg open\, while leaving while (\<>) as is.

That would be okay\, though I still prefer that the simplest code be the safest.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From zefram@fysh.org

Mark Mielke wrote​:

How many people have been "burned" by this "problem"?

I don't get burned by it​: this kind of issue is why I avoid using such DWIMish heuristic features. I only use \<> in throwaway programs that I'm using on known inputs\, and if I want to read arbitrary files then I use three-argument open or equivalent. It's a pity that such a short operator isn't available to do something of wider utility.

Same issues apply with several regexp features\, such as /$/ (funny treatment of newline at end of string) and /\s/ (utf8 flag dependence). I see people get burned by the unexpectedly complex behaviour of these operators all the time. I use the more verbose\, more explicit\, non-obvious constructions that actually do what I mean. I don't hugely object to the extra typing\, but I wish that the operators with plainer semantics were shorter and more accessible so that less nitpicky programmers would use them by default.

I don't see the point. There is no value in restricting Perl's functionailty\,

I don't think anyone's arguing for the functionality to not be there. Just give it a longer name\, that indicates that there's more going on than meets the eye. If we were starting from scratch\, I'd say that a plain honest-to-Ritchie file access should be called "open"\, and the one that can run commands\, expand shell variables\, hack root\, and whatnot should be called "magic_open". See\, nice descriptive names\, with added Huffman value.

-zefram

p5pRT commented 15 years ago

From perl@nevcal.com

On approximately 7/25/2008 2​:13 PM\, came the following characters from the keyboard of Mark Mielke​:

Glenn Linderman wrote​:

One could speculate about "while(\<>)" using 2-arg open if -T is set and 3-arg open otherwise\, with "while(\<\<>>)" or "use magical while;" causing 2-arg open to be used even without -T.

How else can we encourage the dimwits to continue using perl\, after they get burned by stuff like this\, if we don't improve the language?

Responding to Glenn's although not specifically to him\, but to all with this opinion​:

How many people have been "burned" by this "problem"? Is the new class of dimwits exceedingly more dimwitted than the previous? Why was this not a problem 10 years ago?

I probably shouldn't have used the word "dimwit"\, but... it seemed like the argument for not breaking while(\<>) is very much "elitist"... the "I'm smart enough not to get trapped\, so why aren't you?". So to try to talk to that class of people\, I used the term.

But yes\, the new class of dimwits (I might as well keep using the term\, since I've started) are more dimwitted than the previous class\, because they were raised on Windows\, instead of Unix. In Unix class\, shell script handling of special characters\, and the "rm *" with the file named -f to kill you\, and the file name -i to protect you\, were regularly taught to newbies. In Windows\, no one teaches you anything\, you get to learn by the seat of your pants. If you take classes\, someone might think to mention stuff like that\, but if you read documentation\, particularly our Perl documentation\, it is easy to overlook such an "esoteric" topic\, even if it is noticed\, because really\, they are looking for how to open files\, not how to learn defensive programming.

I don't see the point. There is no value in restricting Perl's functionailty\, so that some theoretical dimwit will have one less theoretical security hole in one theoretical scenario. Where is the proof that this "security problem" is causing problems\, and why aren't these dimwits having their hands cut off to prevent them from programming?

Nope\, not suggesting restricting it\, just making easy things safer\, and unsafe things harder.

Fortunately\, a large majority of dimwits put spaces in their filenames\, and have enough problems with that\, that they never think to put in > \< | and other line noise. I think it is purely that that has kept the problem from getting out of hand.

Cheers\, mark

-- Glenn -- http​://nevcal.com/

A protocol is complete when there is nothing left to remove. -- Stuart Cheshire\, Apple Computer\, regarding Zero Configuration Networking

p5pRT commented 15 years ago

From @epa

Zefram \<zefram \ fysh.org> writes​:

[magic behaviour of \<> depending on filenames given]

I don't get burned by it​: this kind of issue is why I avoid using such DWIMish heuristic features.

Choking when you run 'my_program *' and a file called \<x happens to exist is hardly DWIM. Surely when you write while (\<>) what you mean is 'read the files given on the command line'. Which perl doesn't manage\, at the moment :-(.

I don't think anyone's arguing for the functionality to not be there. Just give it a longer name\, that indicates that there's more going on than meets the eye.

That's absolutely it.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Ed Avis wrote​:

How many such programs are there? Is the behaviour really intended?

Have a look at the first few hits on Google Code Search​: not one of them mentions in its documentation that if passed a filename beginning with > it will overwrite some file rather than reading the file given. Yet surely this is just the sort of thing you would take care to mention in the documentation if you knew of it. This leads me to suspect that for the vast majority of cases\, the magical behaviour is not what the author intended.

Where you are mistaken is in the assumption that regular users would use special characters like ">" or "|"\, or that if they did\, they would actually be quoted properly to indicate literal characters as their intent. Remember - the shell ALREADY treats these as special. Anybody who "accidentally" used these in a file name would have to be especially ill informed.

Do you have a real-life scenario where a problem has occurred or are you making a mountain of a mole hill?

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Ed Avis wrote​:

Choking when you run 'my_program *' and a file called \<x happens to exist is hardly DWIM. Surely when you write while (\<>) what you mean is 'read the files given on the command line'. Which perl doesn't manage\, at the moment :-(.

What legitimate scenarios use files with names "\<x"?

I don't think anyone's arguing for the functionality to not be there. Just give it a longer name\, that indicates that there's more going on than meets the eye.

That's absolutely it.

I strongly disagree. You are turning an existing feature with existing syntax into a non-feature\, because you think there MIGHT be a theoretical somebody incompetent enough to hang themselves with a bit of rope. You assumption is that with Perl blocking this supposed security hole\, that this theoretical somebody incompetent will then be safe. I think you are completely wrong. This theoretical somebody incompetent has so many ways that they can screw up. If any company hires such an incompetent person to write their security software\, they deserve what they get.

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Zefram wrote​:

Mark Mielke wrote​:

How many people have been "burned" by this "problem"?

I don't get burned by it​: this kind of issue is why I avoid using such DWIMish heuristic features. I only use \<> in throwaway programs that I'm using on known inputs\, and if I want to read arbitrary files then I use three-argument open or equivalent. It's a pity that such a short operator isn't available to do something of wider utility.

Same issues apply with several regexp features\, such as /$/ (funny treatment of newline at end of string) and /\s/ (utf8 flag dependence). I see people get burned by the unexpectedly complex behaviour of these operators all the time. I use the more verbose\, more explicit\, non-obvious constructions that actually do what I mean. I don't hugely object to the extra typing\, but I wish that the operators with plainer semantics were shorter and more accessible so that less nitpicky programmers would use them by default.

I don't understand. What is /$/ reasonably supposed to do? Do these theoretical people really get burned? Please point out a real-life program that uses /$/ that isn't mere theory.

I don't see the point. There is no value in restricting Perl's functionailty\,

I don't think anyone's arguing for the functionality to not be there. Just give it a longer name\, that indicates that there's more going on than meets the eye. If we were starting from scratch\, I'd say that a plain honest-to-Ritchie file access should be called "open"\, and the one that can run commands\, expand shell variables\, hack root\, and whatnot should be called "magic_open". See\, nice descriptive names\, with added Huffman value

Plain honest-to-Ritchie file access is obsolete. I want to see http​:// automatically parsed. We're decades past the requirements you are raising\, and in the last decade that open(2 args) has existed\, I don't recall a single problem such as you describe.

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From mark@mark.mielke.cc

Glenn Linderman wrote​:

On approximately 7/25/2008 2​:13 PM\, came the following characters from the keyboard of Mark Mielke​:

Glenn Linderman wrote​:

One could speculate about "while(\<>)" using 2-arg open if -T is set and 3-arg open otherwise\, with "while(\<\<>>)" or "use magical while;" causing 2-arg open to be used even without -T.

How else can we encourage the dimwits to continue using perl\, after they get burned by stuff like this\, if we don't improve the language?

Responding to Glenn's although not specifically to him\, but to all with this opinion​:

How many people have been "burned" by this "problem"? Is the new class of dimwits exceedingly more dimwitted than the previous? Why was this not a problem 10 years ago?

I probably shouldn't have used the word "dimwit"\, but... it seemed like the argument for not breaking while(\<>) is very much "elitist"... the "I'm smart enough not to get trapped\, so why aren't you?". So to try to talk to that class of people\, I used the term.

Forget what term you used. Do you have an ACTUAL CASE of failure\, or is this only theory?

But yes\, the new class of dimwits (I might as well keep using the term\, since I've started) are more dimwitted than the previous class\, because they were raised on Windows\, instead of Unix. In Unix class\, shell script handling of special characters\, and the "rm *" with the file named -f to kill you\, and the file name -i to protect you\, were regularly taught to newbies. In Windows\, no one teaches you anything\, you get to learn by the seat of your pants. If you take classes\, someone might think to mention stuff like that\, but if you read documentation\, particularly our Perl documentation\, it is easy to overlook such an "esoteric" topic\, even if it is noticed\, because really\, they are looking for how to open files\, not how to learn defensive programming.

I disagree. Windows and UNIX have been around for about as long as Perl 5. Again\, do you have an ACTUAL CASE of failure\, or is this only theory?

I don't see the point. There is no value in restricting Perl's functionailty\, so that some theoretical dimwit will have one less theoretical security hole in one theoretical scenario. Where is the proof that this "security problem" is causing problems\, and why aren't these dimwits having their hands cut off to prevent them from programming?

Nope\, not suggesting restricting it\, just making easy things safer\, and unsafe things harder.

Things could be made "safe" by chmod ugo-x /usr/bin/perl. You are not answering the question. What is the proof that this "security problem" is causing problems? Where is this outbreak you are trying to prevent?

Fortunately\, a large majority of dimwits put spaces in their filenames\, and have enough problems with that\, that they never think to put in > \< | and other line noise. I think it is purely that that has kept the problem from getting out of hand.

More guessing. Do you have any evidence?

Cheers\, mark

-- Mark Mielke \mark@&#8203;mielke\.cc

p5pRT commented 15 years ago

From @epa

Mark Mielke \<mark \ mark.mielke.cc> writes​:

What legitimate scenarios use files with names "\<x"?

Very few. But if such a filename does exist\, you'd want the program to handle it in a sane way.

Unfortunately the world is not always legitimate. Every program using while (\<>) has to come with a health warning that it is unsafe to use unless you know that no >\<| characters are contained in the filenames.

Remember gets() in the standard C library? What's the problem with it? No legitimate use would ever enter a million characters of text when given a prompt to enter 'yes' or 'no'. And if users are so stupid as to enter too much text and overflow the fixed buffer\, that is a problem with the users and they should be educated to take more care.

Yet today\, gets() is deprecated and nobody uses it - whether writing 'security code' or not. It is just too dangerous.

I strongly disagree. You are turning an existing feature with existing syntax into a non-feature\, because you think there MIGHT be a theoretical somebody incompetent enough to hang themselves with a bit of rope.

First I would like to say that this is best not treated as a moral issue. Those bad programmers\, deserve what they get\, we should teach them a lesson\, etc.

If the measure of incompetence is using 'while (\<>)' in a program without realizing that it is dangerous\, then I have been incompetent many times and so have lots of others. Again\, have a look at the examples you can find on Google Code Search. All of these people have been incompetent and written programs that are unsafe to use on arbitrary lists of files. All the programs either need to be patched to use a safe form of open\, or else need to mention the gotcha in their documentation.

The bar for incompetence or being a dimwit is set remarkably low IMHO. Surely you should not need to be one of the expert 10% (or even the top 90%) to read some files and count the number of lines. Yet I bet that if you get a sample of perl programmers and ask them to implement a simple 'wc -l' in perl\, more than nine out of ten will write one that breaks if given a filename beginning with >\, and most of those without even thinking there might be a problem.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 15 years ago

From zefram@fysh.org

Mark Mielke wrote​:

I don't understand. What is /$/ reasonably supposed to do?

It is very frequently used in an attempt to anchor to the end of the string. Such as

  die "invalid keyword" unless $keyword =~ /^(foo|bar|baz)$/;

In this context the programmer usually doesn't intend to accept "foo\n". Newline is a shell metacharacter\, of course\, and often significant in file formats\, so there's lots of scope for breakage.

                                 Please point out a real\-life 

program that uses /$/ that isn't mere theory.

I presume you mean a real-life program that misbehaves due to misuse of /$/. On a quick look through /usr/local/share/perl\, I found this in Carp​::Clan​:

:unless ( /^-?\d+(?​:\.\d+(?​:[eE][+-]\d+)?)?$/ : ) # Looks numeric :{ : s/([\\\'])/\\$1/g; # Escape \ and ' ...

This is used when displaying function arguments in a stack trace​: it's trying to show numeric values as unquoted numbers and any other defined value as a quoted string. So these are how it's meant to work​:

$ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("abc")' Carp​::Clan​::__ANON__()​: a at -e line 1   main​::foo('abc') called at -e line 1 $ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("abc\n")' Carp​::Clan​::__ANON__()​: a at -e line 1   main​::foo('abc\x0A') called at -e line 1 $ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("123")' Carp​::Clan​::__ANON__()​: a at -e line 1   main​::foo(123) called at -e line 1 $

And this one goes wrong​:

$ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("123\n")' Carp​::Clan​::__ANON__()​: a at -e line 1   main​::foo(123 ) called at -e line 1 $

OK\, this one's not likely to produce a security hole\, but it's just the first instance I set eyes on. It's a very common antipattern.

Plain honest-to-Ritchie file access is obsolete.

I respectfully disagree. I find the Unix filesystem is an excellent abstraction\, and I use it all the time.

                                            I want to see http&#8203;:// 

automatically parsed.

It's possible to map URIs into Unix filename space. (I had a go at this myself for Linux a while back\, but didn't develop it very far.) It's also possible to map Unix filenames into URI space. If you want access to both HTTP and local files in one context\, you can still get this with pure file operations or with pure URI operations\, at your option. If you insist on mixed parsing\, then you're a long way from providing a filename context\, and you have to document the rules in order for users to be able to make intentional use of the features.

                 We're decades past the requirements you are 

raising\,

The requirement to not surprise the user?

-zefram

p5pRT commented 15 years ago

From @sciurius

Mark Mielke \mark@&#8203;mark\.mielke\.cc writes​:

Why was this not a problem 10 years ago?

Long time ago filenames were meant for shell- and command line apps. Modern *ix applications (e.g.\, word processors) allow people to freely supply any filename they want\, spaces and whatever included. E.g.\, when saving a HTML page\, using the \ would be an acceptabe default choice for a file name.</p> <p>So whether it's wrong\, or stupid\, or whatever -- the chances of the problem happening will grow.</p> <p>-- Johan</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From mark@mark.mielke.cc</h3> <p>Johan Vromans wrote​:</p> <blockquote> <p>Mark Mielke \<a href="mailto:mark@&#8203;mark\.mielke\.cc">mark@&#8203;mark\.mielke\.cc</a> writes​:</p> <blockquote> <p>Why was this not a problem 10 years ago?</p> </blockquote> <p>Long time ago filenames were meant for shell- and command line apps. Modern *ix applications (e.g.\, word processors) allow people to freely supply any filename they want\, spaces and whatever included. E.g.\, when saving a HTML page\, using the \<title> would be an acceptabe default choice for a file name.</p> <p>So whether it's wrong\, or stupid\, or whatever -- the chances of the problem happening will grow.</p> </blockquote> <p>My memory tells me that "word processors" and the like have *always* accepted weird characters. In 1990 or so\, I still remember the 'ls' output being messed up because one of the filenames had a BACKSPACE in it. Nothing has changed in this regard. Perhaps people use GUIs more - but even this I don't believe. I've used UNIX GUIs on HP-UX and Solaris since the late 1980s. Frame Maker didn't have file name restrictions that I recall?</p> <p>You are mistaken to believe this is a new problem - my questions is with the urgency. Why does the behaviour need to change? Why is it a problem today when it was not a problem 10 years ago?</p> <p>Cheers\, mark</p> <p>-- Mark Mielke \<a href="mailto:mark@&#8203;mielke\.cc">mark@&#8203;mielke\.cc</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From @sciurius</h3> <p>[Quoting Mark Mielke\, on July 26 2008\, 11​:44\, in "Re​: [perl #2783] Sec"]</p> <blockquote> <p>You are mistaken to believe this is a new problem - my questions is with the urgency. Why does the behaviour need to change? Why is it a problem today when it was not a problem 10 years ago?</p> </blockquote> <p>My point was that the problem is bound to get bigger now more people are using not-filename-limiting tools.</p> <p>-- Johan</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From mark@mark.mielke.cc</h3> <p>Ed Avis wrote​:</p> <blockquote> <p>Mark Mielke \<mark \<at> mark.mielke.cc> writes​:</p> <blockquote> <p>What legitimate scenarios use files with names "\<x"?</p> </blockquote> <p>Very few. But if such a filename does exist\, you'd want the program to handle it in a sane way.</p> <p>Unfortunately the world is not always legitimate. Every program using while (\<>) has to come with a health warning that it is unsafe to use unless you know that no >\<| characters are contained in the filenames.</p> </blockquote> <p>No\, it doesn't. At least not until you can prove this is a real issue hurting people every day.</p> <blockquote> <p>Remember gets() in the standard C library? What's the problem with it? No legitimate use would ever enter a million characters of text when given a prompt to enter 'yes' or 'no'. And if users are so stupid as to enter too much text and overflow the fixed buffer\, that is a problem with the users and they should be educated to take more care.</p> <p>Yet today\, gets() is deprecated and nobody uses it - whether writing 'security code' or not. It is just too dangerous.</p> </blockquote> <p>You are exaggerating. If anybody truly allocated a buffer for gets() that was a million characters\, it wouldn't be a problem. Allocating a Mbyte for an input buffer "just in case"\, however\, is ridiculous. People didn't allocate a million characters - they allocated 64\, or 512\, or 1024. The chance that this is hurt is far\, far greater. You are correct that the interface is wrong for gets() is wrong\, but in your analogy you have ignored the fact that 2-arg open is a FEATURE not a LIMITATION. The situations are not the same.</p> <p>In gets()\, there is a limitation which can cause the executing program to see data corruption\, arbitrary code execution\, or program termination.</p> <p>In Perl \<>\, there is a feature that users can pass shell pipes instead of file names or "-" instead of STDIN. The worst that can happen is that the user requests a file they have permission to overwrite\, be overwritten. This might happen if the user is naive enough to use ">" as the first character in their file name. What cases of this problem exist? What cases prove that the value of the feature is far less than the cost of allowing the feature? Is this theory\, or do you have a legitimate complaint?</p> <blockquote> <blockquote> <p>I strongly disagree. You are turning an existing feature with existing syntax into a non-feature\, because you think there MIGHT be a theoretical somebody incompetent enough to hang themselves with a bit of rope.</p> </blockquote> <p>First I would like to say that this is best not treated as a moral issue. Those bad programmers\, deserve what they get\, we should teach them a lesson\, etc.</p> </blockquote> <p>You are incorrect. Programming languages can do baby sitting duty\, or they can provide plenty of rope. There is a need for both in this world. Both types of languages exist today. Perl has traditionally been the "plenty of rope" type\, and this is a major reason why Perl adoption was so high for many years. What other scripting languages allowed users to run arbitrary system calls? You are suggesting that this Perl tradition be turned over. Your reason for requesting it is a trivial theoretical case that has probably happened once in the life time of Perl.</p> <blockquote> <p>If the measure of incompetence is using 'while (\<>)' in a program without realizing that it is dangerous\, then I have been incompetent many times and so have lots of others. Again\, have a look at the examples you can find on Google Code Search. All of these people have been incompetent and written programs that are unsafe to use on arbitrary lists of files. All the programs either need to be patched to use a safe form of open\, or else need to mention the gotcha in their documentation.</p> </blockquote> <p>Your definition of unsafe and mine are different. What is the worst that can happen with these theoretically unsafe programs?</p> <p>Somebody has a file called ">fun". they pass "*" in. Their program overwrites "fun". So what? In order for their to be damage\, "fun" would need to exist and contain valuable information. What are the chances that both ">fun" and "fun" exist?</p> <p>The only danger is in setuid usage\, where one uses special privileges to overwrite a file they should not have access to. This is precisely what taint-mode is for\, and I fully agree that \<> should raise issues in taint-mode.</p> <p>I don't agree at all that non-taint-mode should warn or error or have alternate behaviour from \<> as it is today.</p> <blockquote> <p>The bar for incompetence or being a dimwit is set remarkably low IMHO. Surely you should not need to be one of the expert 10% (or even the top 90%) to read some files and count the number of lines. Yet I bet that if you get a sample of perl programmers and ask them to implement a simple 'wc -l' in perl\, more than nine out of ten will write one that breaks if given a filename beginning with >\, and most of those without even thinking there might be a problem.</p> </blockquote> <p>By "break" you mean - fail to allow the use of ">" in the beginning of a file\, which is a pretty minimal case.</p> <p>Again - show examples of this problem in the field.</p> <p>Cheers\, mark</p> <p>-- Mark Mielke \<a href="mailto:mark@&#8203;mielke\.cc">mark@&#8203;mielke\.cc</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From mark@mark.mielke.cc</h3> <p>Johan Vromans wrote​:</p> <blockquote> <p>[Quoting Mark Mielke\, on July 26 2008\, 11​:44\, in "Re​: [perl #2783] Sec"]</p> <blockquote> <p>You are mistaken to believe this is a new problem - my questions is with the urgency. Why does the behaviour need to change? Why is it a problem today when it was not a problem 10 years ago?</p> </blockquote> <p>My point was that the problem is bound to get bigger now more people are using not-filename-limiting tools</p> </blockquote> <p>Do you have evidence? Perhaps a list of filenames from 1988\, 1998\, and 2008\, to compare against?</p> <p>Cheers\, mark</p> <p>-- Mark Mielke \<a href="mailto:mark@&#8203;mielke\.cc">mark@&#8203;mielke\.cc</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From mark@mark.mielke.cc</h3> <p>Zefram wrote​:</p> <blockquote> <p>Mark Mielke wrote​:</p> <blockquote> <p>I don't understand. What is /$/ reasonably supposed to do?</p> </blockquote> <p>It is very frequently used in an attempt to anchor to the end of the string. Such as</p> <pre><code>die "invalid keyword" unless $keyword =~ /^\(foo|bar|baz\)$/;</code></pre> <p>In this context the programmer usually doesn't intend to accept "foo\n". Newline is a shell metacharacter\, of course\, and often significant in file formats\, so there's lots of scope for breakage.</p> </blockquote> <p>Ah. You didn't mean /$/ - you meant /...$/. Similar to other issues\, the real-life problems with accepting "foo" vs "foo\n" are pretty few. I expect the /...$/ cases are more than the 2-argument cases in terms of real-life problems\, however\, this is ANOTHER case\, where I disagree that anything should be "fixed". Users who intend to match end-of-string and only end-of-string should be using \z. It is not the job of Perl to redefine long-standing operators to become less functional and create compatibility problems\, just so that somebody who didn't understand what they were doing in the first place\, will not break in an obscure use case.</p> <blockquote> <blockquote> <pre><code> Please point out a real\-life </code></pre> <p>program that uses /$/ that isn't mere theory.</p> </blockquote> <p>I presume you mean a real-life program that misbehaves due to misuse of /$/. On a quick look through /usr/local/share/perl\, I found this in Carp​::Clan​:</p> <p>:unless ( /^-?\d+(?​:\.\d+(?​:[eE][+-]\d+)?)?$/ : ) # Looks numeric :{ : s/([\\\'])/\\$1/g; # Escape \ and ' ...</p> <p>This is used when displaying function arguments in a stack trace​: it's trying to show numeric values as unquoted numbers and any other defined value as a quoted string. So these are how it's meant to work​:</p> <p>$ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("abc")' Carp​::Clan​::__ANON__()​: a at -e line 1 main​::foo('abc') called at -e line 1 $ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("abc\n")' Carp​::Clan​::__ANON__()​: a at -e line 1 main​::foo('abc\x0A') called at -e line 1 $ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("123")' Carp​::Clan​::__ANON__()​: a at -e line 1 main​::foo(123) called at -e line 1 $</p> <p>And this one goes wrong​:</p> <p>$ perl -MCarp​::Clan=confess -we 'sub foo { confess "a" } foo("123\n")' Carp​::Clan​::__ANON__()​: a at -e line 1 main​::foo(123 ) called at -e line 1 $</p> <p>OK\, this one's not likely to produce a security hole\, but it's just the first instance I set eyes on. It's a very common antipattern.</p> </blockquote> <p>It seems like a pretty harmless one.</p> <blockquote> <blockquote> <p>Plain honest-to-Ritchie file access is obsolete.</p> </blockquote> <p>I respectfully disagree. I find the Unix filesystem is an excellent abstraction\, and I use it all the time.</p> </blockquote> <p>More people use rich file system identifiers these days than UNIX file system. On Windows\, it will happily accept http​:// vs ftp​:// vs file​:// vs \\host\share in many of the "open file" dialogs. This is a good thing\, and the trend should continue. PHP already supports this in their open functions and this is widely understood to be a major feature.</p> <blockquote> <blockquote> <pre><code> I want to see http&#8203;:// </code></pre> <p>automatically parsed.</p> </blockquote> <p>It's possible to map URIs into Unix filename space. (I had a go at this myself for Linux a while back\, but didn't develop it very far.) It's also possible to map Unix filenames into URI space. If you want access to both HTTP and local files in one context\, you can still get this with pure file operations or with pure URI operations\, at your option. If you insist on mixed parsing\, then you're a long way from providing a filename context\, and you have to document the rules in order for users to be able to make intentional use of the features.</p> </blockquote> <p>The rules are pretty well recognized\, even by the non-technical. The non-technical would recognize http​:// a long time before they would recognize some non-standard overlay of the UNIX file name space with URI.</p> <blockquote> <blockquote> <pre><code> We're decades past the requirements you are </code></pre> <p>raising\,</p> </blockquote> <p>The requirement to not surprise the user?</p> </blockquote> <p>Users who don't read the document will always be surprised. In standard UNIX\, passing a "-" in has traditionally meant STDIN. One could argue that an experienced user might be "surprised" that "-" was NOT interpretted as STDIN. This argument is entirely relative to the person's expectations.</p> <p>In this case\, the expectations should be well instilled. Perl has done what it has done for over a decade\, and if somebody is truly surprised today - they should pick up the manual and give it another read.</p> <p>Cheers\, mark</p> <p>-- Mark Mielke \<a href="mailto:mark@&#8203;mielke\.cc">mark@&#8203;mielke\.cc</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From @sciurius</h3> <p>Mark Mielke \<a href="mailto:mark@&#8203;mark\.mielke\.cc">mark@&#8203;mark\.mielke\.cc</a> writes​:</p> <blockquote> <p>Do you have evidence? Perhaps a list of filenames from 1988\, 1998\, and 2008\, to compare against?</p> </blockquote> <p>Ah\, come on. There are many more people in 2008 using some Linux with Firefox\, Thunderbird and OpenOffice than there were in 1998 and 1988. There are many more people in 2008 using some Linux totally unaware of command line shells and special characters than there were in 1998 and 1988. The chances that files with special characters in their names are created is therefore bigger in 2008 than they were in 1998 and 1988.</p> <p>-- Johan</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From zefram@fysh.org</h3> <p>Mark Mielke wrote​:</p> <blockquote> <p>Ah. You didn't mean /$/ - you meant /...$/.</p> </blockquote> <p>Yes\, I meant the $ regexp operator in general\, not a pattern consisting only of that operator.</p> <blockquote> <pre><code> this is ANOTHER case\, where I disagree that </code></pre> <p>anything should be "fixed".</p> </blockquote> <p>Oh\, I think /$/ is impossible to fix in perl5. There are a great many programs that run regexps on newline-terminated input lines and use /$/ intending for it to match before the newline. These would seriously break if /$/ changed to match only at end of string. I wasn't arguing for this to change\, but using it as an example of another case where a very short (and therefore attractive) operator tries to DWIM by complex magical behaviour and ends up surprising the programmer.</p> <p>I think \<> could be changed\, however. I'm OK with it continuing to process "-" as stdin; you're right that this is a common Unix convention. But its handling of ">foo" and "rm -rf / |" are certainly not conventional. I think (unlike /$/) that the intentional uses of those features are sufficiently rare that it's worth breaking them to make the operator less surprising for everyone else.</p> <blockquote> <pre><code> Users who intend to match end\-of\-string and </code></pre> <p>only end-of-string should be using \z.</p> </blockquote> <p>Yes\, indeed they need to. But most of them don't​: they use the shorter and more familiar operator\, and end up with a different program from the one they intended to write. And I remember the days when /\z/ didn't exist​: for a while I used /(?!.)/s as the only way to be sure.</p> <p>After writing my earlier message\, I had a think about what start/end anchor operators should logically exist\, depending on how your string is structured into lines\, and which ones we actually have in Perl. I came up with these sets\, defining the behaviour explicitly in terms of the regexp operators with simplest behaviour​:</p> <p>PURPOSE START END sensible pairs​:   single undelimited line \A \z   single line with \n terminator \A (?=\n\z)   zero or more \n-terminated lines (?​:\A|(?\<=\n))(?!\z) (?=\n)   one or more \n-separated lines (?​:\A|(?\<=\n)) (?=\n|\z) actual pairs​:   /^...$/\, /\A...\Z/ \A (?=\n?\z)   /^...$/m (?​:\A|(?\<=\n)(?!\z)) (?=\n|\z)</p> <p>I am mystified as to the circumstances under which one might actually want the behaviour of /$/ (without /m) or /^/m. Certainly they can be correctly used\, with a bit of care\, but as far as I can see they never completely match the actual semantics of what constitutes a line start or end.</p> <blockquote> <pre><code> just so that somebody who didn't understand what </code></pre> <p>they were doing in the first place\, will not break in an obscure use case.</p> </blockquote> <p>I think these (\<> and some of the regexp things) are unreasonably difficult to understand. /^/m is so difficult to understand that its own implementors have trouble with it. It was documented incorrectly in perlre for years\, until I discovered the undocumented /(?!\z)/ bit of its behaviour and pointed it out (bug #27053\, resolved by a documentation change in 5.10).</p> <p>I understand these operators. I will jump through the necessary hoops to write the program that I intend\, even if the hoop is using a 20-character regexp (as in the table above) instead of the single character that I know from grep. Not to put too fine a point on it\, I'm unburned because I know the language inside out and I'm anal about correctness. I'm in a small minority on all three points.</p> <blockquote> <p>Users who don't read the document will always be surprised.</p> </blockquote> <p>Users who read the documentation for perl programs that use \<> (such as our hypothetical wc-in-perl) generally don't get told about the magic meaning of "rm -rf / |" as an argument. They will be surprised and burned by the present behaviour.</p> <p>You have to read quite a long way into the \<> documentation (in perlop) before it mentions pipe magic. It starts off saying it "emulate[s] the behavior of sed and awk"\, which covers "-" and the no-arguments case. It then speaks about filenames\, and gives an example using glob() in a way that makes it (the example) vulnerable to the very problems we are discussing. It never mentions the magic interpretation of ">foo" or that it loses trailing spaces.</p> <p>Magic behaviour of "-" is less surprising\, less dangerous\, and more likely to be documented than any of the others. It also has a well-known workaround\, ./*\, which is also good for avoiding interpretation as switches. The pipe thing can't be worked around that way​: consider the file "./; rm -rf / |". Pipe magic here is very nasty.</p> <blockquote> <p>In this case\, the expectations should be well instilled.</p> </blockquote> <p>I expect generally unpredictable behaviour if I give a filename containing unusual characters to a program written by anyone else. (Not specifically programs written in scripting languages\, or programs written by amateurs\, but just programs written by anyone who wasn't me. Hell\, I once encountered commercial *kernel* code that lost trailing spaces in filenames.) This expectation has been instilled by years of experience\, and reinforced by the psychological understanding that most programmers just don't think about the unusual cases.</p> <p>-zefram</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From @ap</h3> <p>* Tom Christiansen \<a href="mailto:tchrist@&#8203;perl\.com">tchrist@&#8203;perl\.com</a> [2008-07-20 07​:30]​:</p> <blockquote> <p>And the day that I can no longer rely upon the overlying system to automatically understand that "-" is STDIN for input or STDOUT for output is the day that I will fork a parallel copy of Perl that maintains traditional and expected behavior. However\, I trust that will never need to occur\, ofr no pumpking has ever been so blindly cavalier--which can probably be read as "foolish" if you're of that bent.</p> </blockquote> <p>Go ahead. Being able to fork is what free software is all about.</p> <p>I do agree that treating the filename `-` as special is a very nice feature. It cannot cause problems of the sort that filenames with 2-arg-open metacharacters can\, either. But then\, there is no reason that this feature would have to be chucked out along with the intepretation of other special characters.</p> <blockquote> <p>What you don't seem to understand is that taking a homogeneous approach to argument processing is not a bug\, but a feature--a BIG FEATURE. Perhaps you're too much of a youngster to remember the days that shells didn't glob command-line arguments for you\, and each program had to parse and process its own command line string. But I am not. Those were dark days of unpredictability. It's the wrong way to (not) go about things.</p> </blockquote> <p>I fail to follow. First you say that a program doing its own magical interpretation of metacharacters in filenames (like pipes and angle brackets) is a BIG FEATURE.</p> <p>Then you say that the days when programs did their own magical interpretation of metacharacters in filenames (like question marks and asterisks) were dark and ruled by unpredictability.</p> <p>Which one is it?</p> <p>Modern shells already have process substitution\, which can do anything a user could do with magical filenames and more besides. *That* is a homogenous approach to argument processing\, and you are absolutely right​: itā€™s a BIG FAT FEATURE. So what is the value of retaining an API that can and does surprise people in nasty ways in exchange for no added functionality whatsoever?</p> <p>The *only* reason to keep that API is backwards compatibility. Itā€™s a good reason to be sure\, but I would prefer a way forward that satisfies that demand without perpertuating the vestigial pragmatics of yesteryear for all eternity and a day beyond.</p> <p>Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/p5pRT"><img src="https://avatars.githubusercontent.com/u/51798018?v=4" />p5pRT</a> commented <strong> 15 years ago</strong> </div> <div class="markdown-body"> <h3>From @ap</h3> <p>* Mark Mielke \<a href="mailto:mark@&#8203;mark\.mielke\.cc">mark@&#8203;mark\.mielke\.cc</a> [2008-07-25 23​:15]​:</p> <blockquote> <p>How many people have been "burned" by this "problem"? Is the new class of dimwits exceedingly more dimwitted than the previous? Why was this not a problem 10 years ago?</p> </blockquote> <p>Just because you didnā€™t hear about it before doesnā€™t mean no one else knew of it. Iā€™ve been preaching the use of 3-arg open ever since it was introduced\, and Iā€™ve known of the diamond operatorā€™s problem for just as long.</p> <p>Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/></p> </div> </div> <div class="page-bar-simple"> <a href="/Perl/perl5/1566?page=2" class="next">Next</a> </div> <div class="footer"> <ul class="body"> <li>Ā© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>