Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 530 forks source link

readlink() returns result along with garbage #9331

Closed p5pRT closed 15 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#54198 (status was 'resolved')

Searchable as RT54198$

p5pRT commented 16 years ago

From dmelnik@regent.ru

Created by dmelnik@regent.ru

Hi\, There's a problem with readlink().

print readlink("/proc/13917/exe"); results in​: /usr/sbin/squidr.pyo (deleted)

While it should be '/usr/sbin/squid'. Actually there are \0's in the resulting string​:

00000000 2f 75 73 72 2f 73 62 69 6e 2f 73 71 75 69 64 00 |/usr/sbin/squid.| 00000010 72 2e 70 79 6f 00 00 00 00 00 00 00 00 00 00 00 |r.pyo...........| 00000020 20 28 64 65 6c 65 74 65 64 29 | (deleted)|

As far as I understand\, C's readlink() and shell's readlink(1) work fine 'cause they see \0 termination\, while Perl doesn't use it.

The problem has appeared today\, yesterday the code worked fine.

Perl Info ``` Flags: category=core severity=medium This perlbug was built using Perl v5.8.8 in the Red Hat build system. It is being executed now by Perl v5.8.8 - Wed Jan 9 11:30:38 CST 2008. Site configuration information for perl v5.8.8: Configured by Red Hat, Inc. at Wed Jan 9 11:30:38 CST 2008. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.18-8.1.15.el5, archname=x86_64-linux-thread-multi uname='linux linux55.fnal.gov 2.6.18-8.1.15.el5 #1 smp mon oct 22 09:47:50 edt 2007 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Dversion=5.8.8 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Dprivlib=/usr/lib/perl5/5.8.8 -Dsitelib=/usr/lib/perl5/site_perl/5.8.8 -Dvendorlib=/usr/lib/perl5/vendor_perl/5.8.8 -Darchlib=/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi -Dsitearch=/usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi -Dvendorarch=/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi -Darchname=x86_64-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dinc_version_list=5.8.7 5.8.6 5.8.5 -Dscriptdir=/usr/bin' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic', cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='4.1.1 20070105 (Red Hat 4.1.1-52)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags ='' libpth=/usr/local/lib64 /lib64 /usr/lib64 libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.5.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.5' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' Locally applied patches: @INC for perl v5.8.8: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 . Environment for perl v5.8.8: HOME=/root LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 16 years ago

From maddingue@free.fr

Hello\,

Denis Melnikov wrote​:

There's a problem with readlink().

I was bitten by the same thing in a program I wrote at $work. AFAIK\,
it's a Linux only thing.

print readlink("/proc/13917/exe"); results in​: /usr/sbin/squidr.pyo (deleted)

While it should be '/usr/sbin/squid'. Actually there are \0's in the resulting string​:

00000000 2f 75 73 72 2f 73 62 69 6e 2f 73 71 75 69 64 00 |/usr/ sbin/squid.| 00000010 72 2e 70 79 6f 00 00 00 00 00 00 00 00 00 00 00 | r.pyo...........| 00000020 20 28 64 65 6c 65 74 65 64 29 |
(deleted)|

As far as I understand\, C's readlink() and shell's readlink(1) work fine 'cause they see \0 termination\, while Perl doesn't use it.

The problem has appeared today\, yesterday the code worked fine.

Read the full string​: it only appears when the target file has been
deleted. IIRC\, the buffer contains the target path\, a fixed number of
bytes (I don't know their meaning)\, then the string "(deleted)". I
think it's used by utilities like lsof.

Note that Perl simply uses the C readlink(2) function\, so any C
program will show the same thing. Except you usually won't see it
because if you printf("%s\n"\, buf) (where buf has been filled by
readlink(2))\, you'll only see the target path because printf(3) will
stop at the first \0. In Perl\, you see everything because Perl
strings don't end at \0.

It can be demonstrated (on Linux) with a short C program. I don't
have by hand the one I wrote when I discovered this\, but I think it
was something like this​:

$ cat myreadlink.c #include \<stdio.h> #include \<unistd.h>

#define BUF_SIZE 128

int main(int argc\, char *argv[]) {   char buf[BUF_SIZE];   int n\, i;

  n = readlink(argv[1]\, buf\, BUF_SIZE);   printf("readlink() returned %d\n"\, n);   printf(" buf=\"%s\"\n"\, buf);   printf(" buf​: ");

  for (i=0; i\<=BUF_SIZE; i++) {   printf("%02hhx "\, buf[i]);   }

  puts("");

  return 0; }

Compile it\, then call it with "myreadlink /proc/13917/exe". IIRC\,
readlink(2) on Linux returns the total number of bytes it put in the
buffer\, up to and including the "(deleted)" string\, which Perl uses
as the length of the string.

If we consider this a bug\, the following (untested) patch should
solve it​:

Inline Patch ```diff --- pp_sys.c.old 2008-04-30 13:51:55.000000000 +0200 +++ pp_sys.c 2008-05-17 02:25:59.000000000 +0200 @@ -3586,7 +3586,8 @@ EXTEND(SP, 1); if (len < 0) RETPUSHUNDEF; - PUSHp(buf, len); + buf[len] = '\0'; + PUSHp(buf, strlen(buf)); RETURN; #else EXTEND(SP, 1); -- ```

Sébastien Aperghis-Tramoni

Close the world\, txEn eht nepO.

p5pRT commented 16 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 16 years ago

From @gbarr

On May 16\, 2008\, at 7​:28 PM\, Sébastien Aperghis-Tramoni wrote​:

Hello\,

Denis Melnikov wrote​:

There's a problem with readlink().

I was bitten by the same thing in a program I wrote at $work.
AFAIK\, it's a Linux only thing.

It is a "trick" that procfs uses which assumes anyone using readlink
on a link in procfs will stop at the NULL.

print readlink("/proc/13917/exe"); results in​: /usr/sbin/squidr.pyo (deleted)

While it should be '/usr/sbin/squid'. Actually there are \0's in the resulting string​:

00000000 2f 75 73 72 2f 73 62 69 6e 2f 73 71 75 69 64 00 |/usr/ sbin/squid.| 00000010 72 2e 70 79 6f 00 00 00 00 00 00 00 00 00 00 00 | r.pyo...........| 00000020 20 28 64 65 6c 65 74 65 64 29 |
(deleted)|

As far as I understand\, C's readlink() and shell's readlink(1) work fine 'cause they see \0 termination\, while Perl doesn't use it.

The problem has appeared today\, yesterday the code worked fine.

Read the full string​: it only appears when the target file has been
deleted. IIRC\, the buffer contains the target path\, a fixed number
of bytes (I don't know their meaning)\, then the string "(deleted)".
I think it's used by utilities like lsof.

If we consider this a bug\, the following (untested) patch should
solve it​:

I do not consider it a bug. With the patch below anyone attempting to
write utilities to use that information cannot.

Graham.

--- pp_sys.c.old 2008-04-30 13​:51​:55.000000000 +0200 +++ pp_sys.c 2008-05-17 02​:25​:59.000000000 +0200 @​@​ -3586\,7 +3586\,8 @​@​ EXTEND(SP\, 1); if (len \< 0) RETPUSHUNDEF; - PUSHp(buf\, len); + buf[len] = '\0'; + PUSHp(buf\, strlen(buf)); RETURN; #else EXTEND(SP\, 1);

-- Sébastien Aperghis-Tramoni

Close the world\, txEn eht nepO.

p5pRT commented 16 years ago

From @nwc10

On Sat\, May 17\, 2008 at 06​:24​:16AM -0500\, Graham Barr wrote​:

On May 16\, 2008\, at 7​:28 PM\, Sébastien Aperghis-Tramoni wrote​:

Read the full string​: it only appears when the target file has been
deleted. IIRC\, the buffer contains the target path\, a fixed number
of bytes (I don't know their meaning)\, then the string "(deleted)".
I think it's used by utilities like lsof.

If we consider this a bug\, the following (untested) patch should
solve it​:

I do not consider it a bug. With the patch below anyone attempting to
write utilities to use that information cannot.

My view too. readlink is working as documented.

I'm curious whether this feature of the Linux proc filing system is documented. :-)

Nicholas Clark

p5pRT commented 16 years ago

From maddingue@free.fr

Nicholas Clark wrote​:

On Sat\, May 17\, 2008 at 06​:24​:16AM -0500\, Graham Barr wrote​:

On May 16\, 2008\, at 7​:28 PM\, Sébastien Aperghis-Tramoni wrote​:

Read the full string​: it only appears when the target file has been deleted. IIRC\, the buffer contains the target path\, a fixed number of bytes (I don't know their meaning)\, then the string "(deleted)". I think it's used by utilities like lsof.

If we consider this a bug\, the following (untested) patch should solve it​:

I do not consider it a bug. With the patch below anyone attempting to write utilities to use that information cannot.

My view too. readlink is working as documented.

If you allow me to be a little pedant\, it's not exactly working as
documented​:

  readlink EXPR   readlink   Returns the value of a symbolic link\, if symbolic
links are   implemented. If not\, gives a fatal error. If there
is some   system error\, returns the undefined value and sets $!
(errno).   If EXPR is omitted\, uses $_.

i.e.\, it should return the target of the symbolic link\, and it's what
it does on all systems\, including Linux. It's only when the target
file doesn't exist that it returns this additional\, undocumented\,
information\, which doesn't exist on other systems. On OSX\, readlink (2) on a broken link just returns the content of the symbolic link.
So\, one could argue that Perl could/should return consistent value
across operating systems.

Personally\, I can live with it as it is\, given it's just a matter of
s/\0//. Another solution is to add a POSIX​::readlink() that DWIM and
only returns the target.

I'm curious whether this feature of the Linux proc filing system is documented. :-)

IIRC\, I had searched a little back then\, but didn't find anything. Googling a little more today didn't end with more results. The man
page for proc(5) doesn't indicate this​:   » http​://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html

-- Sébastien Aperghis-Tramoni

Close the world\, txEn eht nepO.

p5pRT commented 16 years ago

From @gbarr

On May 17\, 2008\, at 7​:36 PM\, Sébastien Aperghis-Tramoni wrote​:

Nicholas Clark wrote​:

On Sat\, May 17\, 2008 at 06​:24​:16AM -0500\, Graham Barr wrote​:

On May 16\, 2008\, at 7​:28 PM\, Sébastien Aperghis-Tramoni wrote​:

Read the full string​: it only appears when the target file has been deleted. IIRC\, the buffer contains the target path\, a fixed number of bytes (I don't know their meaning)\, then the string "(deleted)". I think it's used by utilities like lsof.

If we consider this a bug\, the following (untested) patch should solve it​:

I do not consider it a bug. With the patch below anyone
attempting to write utilities to use that information cannot.

My view too. readlink is working as documented.

If you allow me to be a little pedant\, it's not exactly working as
documented​:

   readlink EXPR
   readlink
           Returns the value of a symbolic link\, if symbolic  

links are implemented. If not\, gives a fatal error. If there
is some system error\, returns the undefined value and sets
$! (errno). If EXPR is omitted\, uses $_.

i.e.\, it should return the target of the symbolic link\, and it's
what it does on all systems\, including Linux. It's only when the
target file doesn't exist that it returns this additional\,
undocumented\, information\, which doesn't exist on other systems. On
OSX\, readlink(2) on a broken link just returns the content of the
symbolic link. So\, one could argue that Perl could/should return
consistent value across operating systems.

Allow me to be pedant

SYNOPSIS   #include \<unistd.h>

  int   readlink(const char *path\, char *buf\, int bufsiz);

DESCRIPTION   Readlink() places the contents of the symbolic link path in the
buffer   buf\, which has size bufsiz. Readlink does not append a NUL
character to   buf.

RETURN VALUES   The call returns the count of characters placed in the buffer
if it suc-   ceeds\, or a -1 if an error occurs\, placing the error code in
the global   variable errno.

As the man page states\, the system call does not append a nul
character\, but returns the number of characters placed into the
buffer. If your code is treating any embedded nul character as the
end of the link\, then I would suggest that your program is broken.

Personally\, I can live with it as it is\, given it's just a matter
of s/\0//. Another solution is to add a POSIX​::readlink() that DWIM
and only returns the target.

It already does what it is supposed to do.

Graham.

p5pRT commented 15 years ago

@smpeters - Status changed from 'open' to 'resolved'