Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

'close' does not report failure when buffer flush fails #5328

Closed p5pRT closed 20 years ago

p5pRT commented 22 years ago

Migrated from rt.perl.org#8938 (status was 'resolved')

Searchable as RT8938$

p5pRT commented 22 years ago

From @mjdominus

Created by @mjdominus

/dev/full is a character special device that is guaranteed to return ENOSPC on every attempt to write. The following program opens /dev/full\, writes data into the stdio buffer\, and then closes the file\, flushing the buffer​:

  open F\, "> /dev/full" or die "Couldn't open /dev/full​: $!";   print F "I like pie.\n";   my $return = close F;   print "# return = $return; \$! = $!\n";   print $return ? "not ok\n" : "ok\n"; # expect failure

The output should be

  # return = ; $! = No space left on device   ok

Indicating that the close() call failed because the buffer was not flushed successfully. However\, the program instead produces this inconsistent result

  # return = 1; $! = No space left on device   not ok

$! was correctly set\, but close() has incorrectly returned 1. Execution of the equivalent C program shows that stdio 'fclose' does return a failure code in these circumstances.

The culprit appears to be this code in PerlIOStdio_close\, at perlio.c​:2537​:

  if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) {   /* Do not close it but do flush any buffers */   PerlIO_flush(f);   return 0;   }

The 0 here is a success code. (-1 indicates failure.) PerlIOStdio_close seems to be returning success regardless of the indication returned by PerlIO_flush\, which is correctly returning -1.

I did not understand what this block was doing\, so I did not make the obvious change to​:

  if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) {   /* Do not close it but do flush any buffers */   return PerlIO_flush(f);   }

Perl Info ``` Flags: category=core severity=high Site configuration information for perl v5.6.1: Configured by root at Sat Dec 29 11:54:59 EST 2001. Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration: Platform: osname=linux, osvers=2.4.2-2, archname=i586-linux uname='linux plover.com 2.4.2-2 #1 sun apr 8 19:37:14 edt 2001 i586 unknown ' config_args='-des' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-81)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.2.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.6.1: /usr/local/lib/perl5/5.6.1/i586-linux /usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/site_perl/5.6.1/i586-linux /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.6.1: HOME=/home/mjd LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/lib:/usr/lib:/usr/X11R6/lib LOGDIR (unset) PATH=/home/mjd/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/games:/sbin:/usr/sbin:/usr/local/bin/X11R6:/usr/local/bin/mh:/data/mysql/bin:/usr/local/bin/pbm:/usr/local/bin/ezmlm:/home/mjd/TPI/bin:/usr/local/teTeX/bin:/usr/local/mysql/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

\mjd@​plover\.com writes​:

This is a bug report for perl from mjd@​plover.com\, generated with the help of perlbug 1.33 running under perl v5.6.1.

----------------------------------------------------------------- [Please enter your report here]

/dev/full is a character special device that is guaranteed to return ENOSPC on every attempt to write. The following program opens /dev/full\, writes data into the stdio buffer\, and then closes the file\, flushing the buffer​:

open F\, "> /dev/full" or die "Couldn't open /dev/full​: $!"; print F "I like pie.\n"; my $return = close F; print "# return = $return; \$! = $!\n"; print $return ? "not ok\n" : "ok\n"; # expect failure

The output should be

   \# return = ; $\! = No space left on device
   ok

Indicating that the close() call failed because the buffer was not flushed successfully. However\, the program instead produces this inconsistent result

   \# return = 1; $\! = No space left on device
   not ok

$! was correctly set\, but close() has incorrectly returned 1. Execution of the equivalent C program shows that stdio 'fclose' does return a failure code in these circumstances.

The culprit appears to be this code in PerlIOStdio_close\, at perlio.c​:2537​:

if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) { /* Do not close it but do flush any buffers */ PerlIO_flush(f); return 0; }

That code should _not_ be active if that is last thing to be closing the file. However\, I agree that if PerlIO_flush() fails so should the non-last close.

The 0 here is a success code. (-1 indicates failure.) PerlIOStdio_close seems to be returning success regardless of the indication returned by PerlIO_flush\, which is correctly returning -1.

I did not understand what this block was doing\, so I did not make the obvious change to​:

if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) { /* Do not close it but do flush any buffers */ return PerlIO_flush(f); }

I think that _is_ correct.

FWIW what does your test do for PERLIO=perlio ?

[Please do not change anything below this line] ----------------------------------------------------------------- --- Flags​: category=core severity=high --- Site configuration information for perl v5.6.1​:

Configured by root at Sat Dec 29 11​:54​:59 EST 2001.

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration​: Platform​: osname=linux\, osvers=2.4.2-2\, archname=i586-linux uname='linux plover.com 2.4.2-2 #1 sun apr 8 19​:37​:14 edt 2001 i586 unknown ' config_args='-des' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler​: cc='cc'\, ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\, optimize='-O2'\, cppflags='-fno-strict-aliasing' ccversion=''\, gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-81)'\, gccosandvers='' intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12 ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8 alignbytes=4\, usemymalloc=n\, prototype=define Linker and Libraries​: ld='cc'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.2.so\, so=so\, useshrplib=false\, libperl=libperl.a Dynamic Linking​: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic' cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:

--- @​INC for perl v5.6.1​: /usr/local/lib/perl5/5.6.1/i586-linux /usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/site_perl/5.6.1/i586-linux /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl .

--- Environment for perl v5.6.1​: HOME=/home/mjd LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/lib​:/usr/lib​:/usr/X11R6/lib LOGDIR (unset) PATH=/home/mjd/bin​:/usr/local/bin​:/bin​:/usr/bin​:/usr/X11R6/bin​:/usr/games​:/sbin​:/usr/sbin​:/usr/local/bin/X11R6​:/usr/local/bin/mh​:/data/mysql/bin​:/usr/local/bin/pbm​:/usr/local/bin/ezmlm​:/home/mjd/TPI/bin​:/usr/local/teTeX/bin​:/usr/local/mysql/bin PERL_BADLANG (unset) SHELL=/bin/bash -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From @mjdominus

Nick Ing-Simmons \nick\.ing\-simmons@​elixent\.com​:

\mjd@​plover\.com writes​:

The culprit appears to be this code in PerlIOStdio_close\, at perlio.c​:2537​:

if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) { /* Do not close it but do flush any buffers */ PerlIO_flush(f); return 0; }

That code should _not_ be active if that is last thing to be closing the file.

I'm not sure what it means to be the 'last thing to be closing the file'. Right now I'm looking at tests 19-24 of print.t\, which I sent in a message a few hours ago. I've appended these to the bottom of this message for your convenience. After test 19\, Perl is indeed at that point of the code; the backtrace is​:

  #0 Perl_PerlIO_flush (f=0x815981c) at perlio.c​:1434   #1 0x081209f7 in PerlIOStdio_close (f=0x815981c) at perlio.c​:2541   #2 0x0811e8aa in Perl_PerlIO_close (f=0x815981c) at perlio.c​:1172   #3 0x08103dad in Perl_io_close (io=0x8161458\, not_implicit=1 '\001')   at doio.c​:972   #4 0x08103c90 in Perl_do_close (gv=0x8161434\, not_implicit=1 '\001')   at doio.c​:941   #5 0x080edc68 in Perl_pp_close () at pp_sys.c​:573   #6 0x0809b71d in Perl_runops_debug () at dump.c​:1394   #7 0x0806099c in S_run_body (oldscope=1) at perl.c​:1645   #8 0x08060540 in perl_run (my_perl=0x8153208) at perl.c​:1566   #9 0x0804b6e9 in main (argc=2\, argv=0xbffff8ac\, env=0xbffff8b8)   at miniperlmain.c​:85   #10 0x40070e5e in __libc_start_main (main=0x804b650 \

\, argc=2\,   ubp_av=0xbffff8ac\, init=0x804a778 \<_init>\, fini=0x812eb60 \<_fini>\,   rtld_fini=0x4000d3c4 \<_dl_fini>\, stack_end=0xbffff89c)   at ../sysdeps/generic/libc-start.c​:129

if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) { /* Do not close it but do flush any buffers */ return PerlIO_flush(f); }

I think that _is_ correct.

Changing that fixes the problem with test 20\, but not the problem with test 22. I'm still tracking down the test 22 problem. So far\, the problem seems to be here​:

  if (PerlIOBase(f)->flags & PERLIO_F_CANWRITE) {   int i = PerlSIO_fflush(stdio);   return i;   }

(perlio.c​:2622). PerlSIO_fflush appears to be calling the built-in C library fflush(). Whatever it is doing\, the call here returns 0 (success) when I think it shouldn't. I wrote a trivial C program to check that the standard fflush() was doing what it was supposed to do\, and it was.

FWIW what does your test do for PERLIO=perlio ?

It will take some time for me to rebuild Perl to find out. I will let you know.

The tests I'm using​:

if (-e "/dev/full" && open FULL\, "> /dev/full") {   print FULL "I like pie.\n" ? print "ok 19\n" : "not ok 19\n";   # Should fail   my $z = close(FULL);   $z ? print "not ok 20 # z=$z; $!\n" : "ok 20\n";   $! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";  
  local $| = 1;   if (open FULL\, "> /dev/full") {   # Should fail   my $z = print FULL "I like pie.\n";   $z ? print "not ok 22 # z=$z; $!\n" : "ok 22\n";   $! =~ /No space left on device/ ? print "ok 23\n" : "not ok 23\n";   close FULL ? print "ok 24\n" : "not ok 24\n";   } else {   print "# couldn't open /dev/full the second time not ok 22\nnot ok 23\nnot ok24\n";   } } else {   for (19..24) { print "ok $_ # skipped (no /dev/full)\n" } }

p5pRT commented 22 years ago

From @mjdominus

FWIW what does your test do for PERLIO=perlio ?

It appears to do the same thing​:

  plover% ./miniperl t/io/print.t   1..24   ok 1   ok 2   ok 3   ok 4   ok 5   ok 6   ok 7   ok 8   ok 9   ok 10   ok 11   ok 12   ok 13   ok 14   ok 15   ok 16   ok 17   # null =>[

p5pRT commented 22 years ago

From @jhi

$! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";

Nit​: please use $!{ENOSPC} instead.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 22 years ago

From @mjdominus

$! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";

Nit​: please use $!{ENOSPC} instead.

OK. I wasn't sure what to do about that. It seemed peculiar to make print.t depend on Errno.pm\, but I guess it's OK since Errno won't be loaded until after the first 18 tests have passed.

--- t/io/print.t 2002/04/16 17​:56​:19 1.1 +++ t/io/print.t 2002/04/16 18​:21​:38 @​@​ -1\,6 +1\,6 @​@​ #!./perl

-print "1..18\n"; +print "1..24\n";

$foo = 'STDOUT'; print $foo "ok 1\n"; @​@​ -31\,4 +31\,27 @​@​ {   local $\ = "ok 17\n# null =>[\000]\nok 18\n";   print ""; +} + +$\, = $\ = ""; +if (-e "/dev/full" && open FULL\, "> /dev/full") { + print FULL "I like pie.\n" ? print "ok 19\n" : "not ok 19\n"; + # Should fail + my $z = close(FULL); + $z ? print "not ok 20 # z=$z; $!\n" : "ok 20\n"; + $!{ENOSPC} ? print "ok 21\n" : "not ok 21\n"; +
+ local $| = 1; + if (open FULL\, "> /dev/full") { + # Should fail + my $z = print FULL "I like pie.\n"; + $z ? print "not ok 22 # z=$z; $!\n" : "ok 22\n"; + $! =~ /No space left on device/ ? print "ok 23\n" : "not ok 23\n"; + close FULL ? print "ok 24\n" : "not ok 24\n"; + } else { + print "# couldn't open /dev/full the second time +not ok 22\nnot ok 23\nnot ok24\n"; + } +} else { + for (19..24) { print "ok $_ # skipped (no /dev/full)\n" } }

p5pRT commented 22 years ago

From @jhi

On Tue\, Apr 16\, 2002 at 02​:24​:58PM -0400\, Mark-Jason Dominus wrote​:

$! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";

Nit​: please use $!{ENOSPC} instead.

OK. I wasn't sure what to do about that. It seemed peculiar to make print.t depend on Errno.pm\, but I guess it's OK since Errno won't be

Ahh\, yes\, there's that. Hmmm.

loaded until after the first 18 tests have passed.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 22 years ago

From @mjdominus

Ahh\, yes\, there's that. Hmmm.

I wasn't really happy about putting those tests into print.t in the first place\, but I really didn't know where would be better.

What would you say to t/io/full.t?

p5pRT commented 22 years ago

From @jhi

On Tue\, Apr 16\, 2002 at 02​:28​:48PM -0400\, Mark-Jason Dominus wrote​:

Ahh\, yes\, there's that. Hmmm.

I wasn't really happy about putting those tests into print.t in the first place\, but I really didn't know where would be better.

What would you say to t/io/full.t?

That might make sense. /dev/full isn't that widely supported.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 22 years ago

From @tux

On Tue 16 Apr 2002 20​:24\, Mark-Jason Dominus \mjd@&#8203;plover\.com wrote​:

$! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";

Nit​: please use $!{ENOSPC} instead.

OK. I wasn't sure what to do about that. It seemed peculiar to make print.t depend on Errno.pm\, but I guess it's OK since Errno won't be loaded until after the first 18 tests have passed.

--- t/io/print.t 2002/04/16 17​:56​:19 1.1 +++ t/io/print.t 2002/04/16 18​:21​:38 @​@​ -1\,6 +1\,6 @​@​ #!./perl

-print "1..18\n"; +print "1..24\n";

$foo = 'STDOUT'; print $foo "ok 1\n"; @​@​ -31\,4 +31\,27 @​@​ { local $\ = "ok 17\n# null =>[\000]\nok 18\n"; print ""; +} + +$\, = $\ = ""; +if (-e "/dev/full" && open FULL\, "> /dev/full") {

-e should be either -c or -b don't you think? /If/ you think -e is safe enough (which I don't)\, at least use ">> "\, so you won't destroy the current content.

I've no problem imagining a sysadmin keeping a plain file called "full" on /dev in which he/she administrates the full tapes. It wouldn't be my /advise/ to do so\, but still\, I can easily imagine

+ print FULL "I like pie.\n" ? print "ok 19\n" : "not ok 19\n"; + # Should fail + my $z = close(FULL); + $z ? print "not ok 20 # z=$z; $!\n" : "ok 20\n"; + $!{ENOSPC} ? print "ok 21\n" : "not ok 21\n"; +
+ local $| = 1; + if (open FULL\, "> /dev/full") { + # Should fail + my $z = print FULL "I like pie.\n"; + $z ? print "not ok 22 # z=$z; $!\n" : "ok 22\n"; + $! =~ /No space left on device/ ? print "ok 23\n" : "not ok 23\n"; + close FULL ? print "ok 24\n" : "not ok 24\n"; + } else { + print "# couldn't open /dev/full the second time +not ok 22\nnot ok 23\nnot ok24\n"; + } +} else { + for (19..24) { print "ok $_ # skipped (no /dev/full)\n" } }

-- H.Merijn Brand Amsterdam Perl Mongers (http​://amsterdam.pm.org/) using perl-5.6.1\, 5.7.3 & 631 on HP-UX 10.20 & 11.00\, AIX 4.2\, AIX 4.3\,   WinNT 4\, Win2K pro & WinCE 2.11. Smoking perl CORE​: smokers@​perl.org http​://archives.develooper.com/daily-build@​perl.org/ perl-qa@​perl.org send smoke reports to​: smokers-reports@​perl.org\, QA​: http​://qa.perl.org

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Mark-Jason Dominus \mjd@&#8203;plover\.com writes​:

FWIW what does your test do for PERLIO=perlio ?

It will take some time for me to rebuild Perl to find out. I will let you know.

You don't need to rebuild perl. You are already using PerlIO\, it is just defaulting to system's stdio as the implementation. You have a few others in perlio.c to choose from at run time :-)

Just (assuming sh/bash style command line)

PERLIO=perlio perl your/script

or

env PERLIO=perlio perl your/script

The tests I'm using​:

if (-e "/dev/full" && open FULL\, "> /dev/full") { print FULL "I like pie.\n" ? print "ok 19\n" : "not ok 19\n"; # Should fail my $z = close(FULL); $z ? print "not ok 20 # z=$z; $!\n" : "ok 20\n"; $! =~ /No space left on device/ ? print "ok 21\n" : "not ok 21\n";

local $| = 1; if (open FULL\, "> /dev/full") { # Should fail my $z = print FULL "I like pie.\n"; $z ? print "not ok 22 # z=$z; $!\n" : "ok 22\n"; $! =~ /No space left on device/ ? print "ok 23\n" : "not ok 23\n"; close FULL ? print "ok 24\n" : "not ok 24\n"; } else { print "# couldn't open /dev/full the second time not ok 22\nnot ok 23\nnot ok24\n"; } } else { for (19..24) { print "ok $_ # skipped (no /dev/full)\n" } } -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From @mjdominus

With PERLIO=perlio\, the tests pass.

I had to correct several errors in the test file first. Here it is now​:

  $\, = $\ = "";   if (-c "/dev/full" && open FULL\, "> /dev/full") {   print FULL "I like pie.\n" ? print "ok 19\n" : print "not ok 19\n";   # Should fail   my $z = close(FULL);   $z ? print "not ok 20 # z=$z; $!\n" : print "ok 20\n";   $!{ENOSPC} ? print "ok 21\n" : print "not ok 21\n";

  if (open FULL\, "> /dev/full") {   select FULL; $| = 1; select STDOUT;   # Should fail   my $z = print FULL "I like pie.\n";   $z ? print "not ok 22 # z=$z; $!\n" : print "ok 22\n";   $!{ENOSPC} ? print "ok 23\n" : print "not ok 23\n";   my $z = close FULL;   $z ? print "ok 24\n" : print "not ok 24 # z=$s; $!\n";   } else {   print "# couldn't open /dev/full the second time   not ok 22\nnot ok 23\nnot ok24\n";   }   } else {   for (19..24) { print "ok $_ # skipped (no /dev/full)\n" }   }

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

NI-XS\,

  I just finished implementing fallbacks for Encode​::XS. Now I have started working on PerlIO part but it is quite complicated and I am lost a little bit.

perldoc perlapi

sv_gets Get a line from the filehandle and store it into the SV\, optionally appending to the currently- stored string.

                   char\*   sv\_gets\(SV\* sv\, PerlIO\* fp\, I32 append\)

Can I really use this to emulate line-buffered input? What's the reason for fast_gets() stuff in the beginning?

Dan the Encode Maintainer

P.S. Your encode_s was quite sufficient to implement fallbacks a la ucm (though I needed to simplify encode_t->rep a little bit on enc2xs).
Carefully designed data structure definitely rules.

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Mark-Jason Dominus \mjd@&#8203;plover\.com writes​:

With PERLIO=perlio\, the tests pass.

Excellent :-)

Now - back to the PERLIO=stdio (default) case\, and with the fixed test what are the remaining symptoms?

I had to correct several errors in the test file first. Here it is now​:

$\, = $\ = ""; if (-c "/dev/full" && open FULL\, "> /dev/full") { print FULL "I like pie.\n" ? print "ok 19\n" : print "not ok 19\n"; # Should fail my $z = close(FULL); $z ? print "not ok 20 # z=$z; $!\n" : print "ok 20\n"; $!{ENOSPC} ? print "ok 21\n" : print "not ok 21\n";

 if \(open FULL\, "> /dev/full"\) \{
   select FULL;   $| = 1;  select STDOUT;
   \# Should fail
   my $z = print FULL "I like pie\.\\n";
   $z ? print "not ok 22 \# z=$z; $\!\\n" : print "ok 22\\n";
   $\!\{ENOSPC\} ? print "ok 23\\n" : print "not ok 23\\n";
   my $z = close FULL;
   $z ? print "ok 24\\n" : print "not ok 24 \# z=$s; $\!\\n";
 \} else \{
   print "\# couldn't open /dev/full the second time

not ok 22\nnot ok 23\nnot ok24\n"; } } else { for (19..24) { print "ok $_ # skipped (no /dev/full)\n" } } -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

NI-XS\,

I just finished implementing fallbacks for Encode​::XS. Now I have started working on PerlIO part but it is quite complicated and I am lost a little bit.

I really think encoding layer should be unbundled from Encode. They do not really require same set of expertise. And having :encoding in Encode.xs is going to "couple" it closer to the core (perlio structure layout etc) than it would otherwise be.

Now Encode is part of perl distribution I think :encoding can go off and live in (say) ext/PerlIO/encoding. Where it can serve as an example of an XS PerlIO layer.

Question - is that too radical this close to release?

perldoc perlapi

sv_gets Get a line from the filehandle and store it into the SV\, optionally appending to the currently- stored string.

                   char\*   sv\_gets\(SV\* sv\, PerlIO\* fp\, I32 append\)

Can I really use this to emulate line-buffered input?

To be honest that never occured to me before. I expect "one" could.

However it may not be a good idea and sv_gets() relies on perl's $/ and has various other oddities.

What's the reason for fast_gets() stuff in the beginning?

sv_gets() pokes about in the IO system's buffer if it can.

Dan the Encode Maintainer

P.S. Your encode_s was quite sufficient to implement fallbacks a la ucm (though I needed to simplify encode_t->rep a little bit on enc2xs).
Carefully designed data structure definitely rules. -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Wednesday\, April 17\, 2002\, at 06​:41 \, Nick Ing-Simmons wrote​:

I really think encoding layer should be unbundled from Encode. They do not really require same set of expertise. And having :encoding in Encode.xs is going to "couple" it closer to the core (perlio structure layout etc) than it would otherwise be.

Frankly\, I agree. Encode.xs is now quite bloated (well\, nothing compare to pp_*.c\, though).

Now Encode is part of perl distribution I think :encoding can go off and live in (say) ext/PerlIO/encoding. Where it can serve as an example of an XS PerlIO layer.

I wonder why this idea didn't come up sooner!

Question - is that too radical this close to release?

Radical? No. Too close to release? I think so.... but wait! All I have to do is zap between #if and #endif And All you have to do is paste that part to ext/PerlIO/encoding/.xs or somewhere. Maybe it is not as hard as it sounds. As for tests\, only t/JP.t t/encoding.t\, and t/jperl.t appear to be PerlIO dependent.

Or man! Why didn't we come up w/ this idea sooner!

I will go ahead w/ the plan. I will release the next version with PerlIO part untouched to let us sync. Then the following version will detach the PerlIO part. How's that sound ?

Dan

p5pRT commented 22 years ago

From @mjdominus

Now - back to the PERLIO=stdio (default) case\, and with the fixed test what are the remaining symptoms?

Test 20 still fails. But with the patch you approved\, they all pass.

Here's the patch. I'm Cc'ing Jarkko so he can put it in. Jarkko\, this patch also includes the only *correct* version of the tests.

--- perlio.c 2002/04/16 22​:43​:57 1.1 +++ perlio.c 2002/04/16 22​:44​:11 @​@​ -2537\,8 +2537\,7 @​@​   FILE *stdio = PerlIOSelf(f\, PerlIOStdio)->stdio;   if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) {   /* Do not close it but do flush any buffers */ - PerlIO_flush(f); - return 0; + return PerlIO_flush(f);   }   return ( #ifdef SOCKS5_VERSION_NAME --- MANIFEST 2002/04/16 23​:21​:22 1.2 +++ MANIFEST 2002/04/16 23​:22​:02 @​@​ -2270\,6 +2270\,7 @​@​ t/io/dup.t See if >& works right t/io/fflush.t See if auto-flush on fork/exec/system/qx works t/io/fs.t See if directory manipulations work +t/io/full.t See if 'disk full' errors are reported t/io/inplace.t See if inplace editing works t/io/iprefix.t See if inplace editing works with prefixes t/io/nargv.t See if nested ARGV stuff works

Inline Patch ```diff --- /dev/null Fri Mar 23 23:37:44 2001 +++ t/io/full.t Tue Apr 16 18:52:47 2002 @@ -0,0 +1,31 @@ +#!./perl +# +# Test for 'disk full' errors, if possible +# 20020416 mjd-perl-patch+@plover.com + +unless (-c "/dev/full" && open FULL, "> /dev/full") { + print "1..0\n"; exit; +} + +my $z; +print "1..6\n"; + +print FULL "I like pie.\n" ? print "ok 1\n" : print "not ok 1\n"; +# Should fail +$z = close(FULL); +print $z ? "not ok 2 # z=$z; $!\n" : "ok 2\n"; +print $!{ENOSPC} ? "ok 3\n" : print "not ok 3\n"; + +unless (open FULL, "> /dev/full") { + print "# couldn't open /dev/full the second time: $!\n"; + print "not ok $_\n" for 4..6; +} + +select FULL; $| = 1; select STDOUT; + +# Should fail +$z = print FULL "I like pie.\n"; +print $z ? "not ok 4 # z=$z; $!\n" : "ok 4\n"; +print $!{ENOSPC} ? "ok 5\n" : "not ok 5\n"; +$z = close FULL; +print $z ? "ok 6\n" : "not ok 6 # z=$s; $!\n"; ```
p5pRT commented 22 years ago

From @jhi

On Wed\, Apr 17\, 2002 at 07​:49​:19AM +0900\, Dan Kogai wrote​:

On Wednesday\, April 17\, 2002\, at 06​:41 \, Nick Ing-Simmons wrote​:

I really think encoding layer should be unbundled from Encode. They do not really require same set of expertise. And having :encoding in Encode.xs is going to "couple" it closer to the core (perlio structure layout etc) than it would otherwise be.

Frankly\, I agree. Encode.xs is now quite bloated (well\, nothing compare to pp_*.c\, though).

Now Encode is part of perl distribution I think :encoding can go off and live in (say) ext/PerlIO/encoding. Where it can serve as an example of an XS PerlIO layer.

I wonder why this idea didn't come up sooner!

Question - is that too radical this close to release?

Radical? No. Too close to release? I think so.... but wait! All I have to do is zap between #if and #endif And All you have to do is paste that part to ext/PerlIO/encoding/.xs or somewhere. Maybe it is not as hard as it sounds. As for tests\, only t/JP.t t/encoding.t\, and t/jperl.t appear to be PerlIO dependent.

Or man! Why didn't we come up w/ this idea sooner!

I will go ahead w/ the plan. I will release the next version with PerlIO part untouched to let us sync. Then the following version will detach the PerlIO part. How's that sound ?

Ok.

Dan

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Folks\,

  I have released Encode ver.1.41 as follows.

Whole​:   http​://www.dan.co.jp/~dankogai/Encode-1.41.tar.gz   CPAN Diff​:   http​://www.dan.co.jp/~dankogai/current-1.41.diff.gz

=head1 CAUTION

This will be the last Encode module that has PerlIO "​:encoding()" bundled. From the next version and on\, It will be released as ext/PerlIO/encoding. So for those who use bleedperl for regular business (in spite of -Dusedevel)\, maybe you should wait while BOTH Encode 1.42 or later AND ext/PerlIO/encoding appear in the perl-current repository.

On Wednesday\, April 17\, 2002\, at 08​:25 \, Jarkko Hietaniemi wrote​:

On Wed\, Apr 17\, 2002 at 07​:49​:19AM +0900\, Dan Kogai wrote​: I will go ahead w/ the plan. I will release the next version with PerlIO part untouched to let us sync. Then the following version will detach the PerlIO part. How's that sound ?

Ok.

=head1 Notable Changes

Encode​::XS can now handle substitution characters. Encode​::Encoding noted that when $enc->encode($str\, 0)\, it should try its best to replace unmapped characters with substitution characters but that feature was not implemented; It always acted like $enc->encode($str\, 1). Now it behaves as documented.

I also added a special case for CHECK\, -1. When -1 is fed\, well...
please check for yourself. You can check it in action via

piconv -p -f foo -t bar

Try

piconv -p -f utf8 -t ascii

to see it clear.

And Changes right after the sig.

Dan the Encode Maintainer

1.41 $Date​: 2002/04/16 23​:35​:00 $ ! encoding.pm   binmode(STDIN|STDOUT ...) done iff PerlIO is available ! t/*.t   Cleaned up PerlIO skip conditions to prepare for the upcoming   Encode - PerlIO forking. ! Encode.pm   exported functions are now prototyped. ! lib/Encode/CN/HZ.pm ! bin/enc2xs ! Encode.xs   fallback implemented # was /* FIXME */   affected programs revised to fit (only HZ was using the try-catch   approach which needed to be fixed for API-compliance). ! Encode/Config.pm ! Encode/KR/2022_KR.pm ! Encode/KR/KR.pm   can find =head1 NAME now\, jhi   Message-Id​: \20020416083059\.V30639@&#8203;alpha\.hut\.fi ! encoding.pm   s/\{h\}/{$h}/g ;) ! Encode.xs   now complies with less warnings with the pickest compilers.   Suggested by Craig\, fixed by Dan.   ! Encode/Makefile_PL.e2x ! bin/enc2xs   A bug that fails to find *.e2x in certain conditions fixed

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

NI-XS\, jhi and porters\,

The surgical operation is finished. PerlIO layer functions in Encode.xs has been successfully detached. Now PerlIO part is in PerlIO​::encoding. They are now more like interdependent than dependent. You can get one via URLs below;

http​://www.dan.co.jp/~dankogai/PerlIO-encoding-0.01.tar.gz http​://www.dan.co.jp/~dankogai/Encode-1.42.tar.gz http​://www.dan.co.jp/~dankogai/perl-dan.tar.bz2

The last one is the whole perl with interdependent versions of Encode and PerlIO. As a matter of fact\, just replace Encode with 1.42 above\, untargzip PerlIO-encoding-0.01 at ext/PerlIO/ and rename the thawed directory to "encoding"\, and fix toplevel MANIFEST and it will work perfectly. Configure file needed now modification.

Here is how Encode tests as a module.

t/Aliases.....ok t/CN..........ok t/Encode......ok t/Encoder.....ok t/JP..........ok\, 6/27 skipped​: PerlIO Encoding Needed t/KR..........ok\, 6/22 skipped​: PerlIO Encoding Needed t/TW..........ok t/Unicode.....ok t/encoding....ok t/grow........ok t/jperl.......ok All tests successful\, 12 subtests skipped. Files=11\, Tests=4616\, 11 wallclock secs ( 7.52 cusr + 0.50 csys =
8.02 CPU)

And with Whole perl and PerlIO

ext/Encode/t/CN.....................ok ext/Encode/t/Encode.................ok ext/Encode/t/Encoder................ok ext/Encode/t/JP.....................ok ext/Encode/t/KR.....................ok ext/Encode/t/TW.....................ok ext/Encode/t/Unicode................ok ext/Encode/t/encoding...............ok ext/Encode/t/grow...................ok ext/Encode/t/jperl..................ok [....] ext/PerlIO/PerlIO...................ok ext/PerlIO/t/encoding...............ok ext/PerlIO/t/scalar.................ok ext/PerlIO/t/via....................ok

See ext/PerlIO/t/encoding.t was never modified. So it is 100% compatible with the prior version.

FYI those will not be uploaded to CPAN; I'll wait until perl-current catches up. And PerlIO​::encoding is not mine but NI-XS. So if it is to be CPANized\, it must be done by NI-XS (I pretty much doubt if he does\, however).

.....Man\, I'm exhausted. Autrijus\, Jungshik\, sorry for not responding soon. Please let me take a nap before I process your new READMEs.

Dan the Encode Maintainer.

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Mark-Jason Dominus wrote​: [snip]

+unless (open FULL\, "> /dev/full") { + print "# couldn't open /dev/full the second time​: $!\n"; + print "not ok $_\n" for 4..6; +}

You need an exit there just after those "not ok"s.

-- print reverse( "\,rekcah"\, " lreP"\, " rehtona"\, " tsuJ" )."\n";

p5pRT commented 22 years ago

From @abigail

On Tue\, Apr 16\, 2002 at 08​:27​:46PM +0200\, H.Merijn Brand wrote​:

I've no problem imagining a sysadmin keeping a plain file called "full" on /dev in which he/she administrates the full tapes. It wouldn't be my /advise/ to do so\, but still\, I can easily imagine

Well\, if he makes that world writeable\, or runs 'make test'\, he gets what he's asking for.... ;-)

Abigail

p5pRT commented 22 years ago

From @tux

On Wed 17 Apr 2002 22​:00\, abigail@​foad.org wrote​:

On Tue\, Apr 16\, 2002 at 08​:27​:46PM +0200\, H.Merijn Brand wrote​:

I've no problem imagining a sysadmin keeping a plain file called "full" on /dev in which he/she administrates the full tapes. It wouldn't be my /advise/ to do so\, but still\, I can easily imagine

Well\, if he makes that world writeable\, or runs 'make test'\, he gets what he's asking for.... ;-)

I bet you a beer that you have met people in your carrier that always work as root on their workstation

-- H.Merijn Brand Amsterdam Perl Mongers (http​://amsterdam.pm.org/) using perl-5.6.1\, 5.7.3 & 631 on HP-UX 10.20 & 11.00\, AIX 4.2\, AIX 4.3\,   WinNT 4\, Win2K pro & WinCE 2.11. Smoking perl CORE​: smokers@​perl.org http​://archives.develooper.com/daily-build@​perl.org/ perl-qa@​perl.org send smoke reports to​: smokers-reports@​perl.org\, QA​: http​://qa.perl.org

p5pRT commented 22 years ago

From @jhi

On Thu\, Apr 18\, 2002 at 01​:14​:14PM +0200\, H.Merijn Brand wrote​:

On Wed 17 Apr 2002 22​:00\, abigail@​foad.org wrote​:

On Tue\, Apr 16\, 2002 at 08​:27​:46PM +0200\, H.Merijn Brand wrote​:

I've no problem imagining a sysadmin keeping a plain file called "full" on /dev in which he/she administrates the full tapes. It wouldn't be my /advise/ to do so\, but still\, I can easily imagine

Well\, if he makes that world writeable\, or runs 'make test'\, he gets what he's asking for.... ;-)

I bet you a beer that you have met people in your carrier that always work as root on their workstation

That's "sucker bet". Usually it ges even better​: telnet and root logins are usually involved\, too.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 22 years ago

From @mjdominus

Created by @mjdominus

/dev/full is a character special device that is guaranteed to return ENOSPC on every attempt to write. The following program opens /dev/full\, writes data into the stdio buffer\, and then closes the file\, flushing the buffer​:

  open F\, "> /dev/full" or die "Couldn't open /dev/full​: $!";   print F "I like pie.\n";   my $return = close F;   print "# return = $return; \$! = $!\n";   print $return ? "not ok\n" : "ok\n"; # expect failure

The output should be

  # return = ; $! = No space left on device   ok

Indicating that the close() call failed because the buffer was not flushed successfully. However\, the program instead produces this inconsistent result

  # return = 1; $! = No space left on device   not ok

$! was correctly set\, but close() has incorrectly returned 1. Execution of the equivalent C program shows that stdio 'fclose' does return a failure code in these circumstances.

The culprit appears to be this code in PerlIOStdio_close\, at perlio.c​:2537​:

  if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) {   /* Do not close it but do flush any buffers */   PerlIO_flush(f);   return 0;   }

The 0 here is a success code. (-1 indicates failure.) PerlIOStdio_close seems to be returning success regardless of the indication returned by PerlIO_flush\, which is correctly returning -1.

I did not understand what this block was doing\, so I did not make the obvious change to​:

  if (PerlIOUnix_refcnt_dec(fileno(stdio)) > 0) {   /* Do not close it but do flush any buffers */   return PerlIO_flush(f);   }

Perl Info ``` Flags: category=core severity=high Site configuration information for perl v5.6.1: Configured by root at Sat Dec 29 11:54:59 EST 2001. Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration: Platform: osname=linux, osvers=2.4.2-2, archname=i586-linux uname='linux plover.com 2.4.2-2 #1 sun apr 8 19:37:14 edt 2001 i586 unknown ' config_args='-des' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-81)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.2.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.6.1: /usr/local/lib/perl5/5.6.1/i586-linux /usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/site_perl/5.6.1/i586-linux /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.6.1: HOME=/home/mjd LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/lib:/usr/lib:/usr/X11R6/lib LOGDIR (unset) PATH=/home/mjd/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/games:/sbin:/usr/sbin:/usr/local/bin/X11R6:/usr/local/bin/mh:/data/mysql/bin:/usr/local/bin/pbm:/usr/local/bin/ezmlm:/home/mjd/TPI/bin:/usr/local/teTeX/bin:/usr/local/mysql/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

Folks\,

I have released Encode ver.1.41 as follows.

Whole​: http​://www.dan.co.jp/~dankogai/Encode-1.41.tar.gz CPAN Diff​: http​://www.dan.co.jp/~dankogai/current-1.41.diff.gz

=head1 CAUTION

I am not sure when the change went in\, but current Encode.xs has broken Tk804.

With $encoding->decode($string\,1)

now croaks if character does not map. Croaking is fine as a default for checking but Tk would like a value of check which does not croak\, but just returns leaving $string starting with the failing character. I could do a G_EVAL but that is a lot of overhead\, and does not tell me which character position failed (unless $string is updated before the croak.) (Tk does 10\,000s of probes - found a character XXXX\, have font with encoding YYYY\, can YYYY encode XXXX ? I hope to reduce that number by refining the code but it will still do a lot)

With current Encode I don't get to try any interesting fonts because it croaks when Tk asks iso-8859-1 if it can do the interesting character :-(

Right now we have​:   check == 0\, fallback char (New and overdue - thanks!)   check == -1\, perlqq \X{xxxx} style croak   otherwise \N{U+XXXX} style croak

(Did \N{U+XXXX} get (back) in ? - I seem to recall it got removed once.)

You have established the principle of check values meaning something (which was always the plan).

Can I suggest though that we make it a bit mask - a stab at an initial set of bits :   check == 0 - fallback   (check & 3) == 1 - croak   (check & 3) == 2 - warn   (check & 3) == 3 - silent return   (check & 4) - \x{xxxx} vs \N{U+XXXX} If you like make $string adjustment optional   check & 8 - Update Don't bother to update $string.

Thus   check == 0 - fallbacks   check == 1 - \N{U+XXXX} croak   check == 2 - \x{XXXX} croak   check == 3 - silent fail   chack == 4 - Uninteresting   check == 5 - \N{U+XXXX} warn   check == 6 - \x{XXXX} warn   check == 11 - silent fail with $string updated (What Tk wants)

Better schemes welcome.  

Another alternative hinted at in old pods was passing check as an SV. Then if SV was a scalar ref\, then set $str to point at fail and return reason code in the scalar.

PS​:

To pick nits - Encode.xs's "layout" looks rather peculiar with perl source's default tab setting of 8 and expected indent of 4\, and many of files you have touched now have trailing whitespace on ends of lines.

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Friday\, April 19\, 2002\, at 05​:01 \, Nick Ing-Simmons wrote​:

I am not sure when the change went in\, but current Encode.xs has broken Tk804.

Ouch.

With $encoding->decode($string\,1)

now croaks if character does not map. Croaking is fine as a default for checking but Tk would like a value of check which does not croak\, but just returns leaving $string starting with the failing character. I could do a G_EVAL but that is a lot of overhead\, and does not tell me which character position failed (unless $string is updated before the croak.)

Yikes. I DID fix the behavior as documented. But it was not just Encode​::CN​::HZ that was taking advantage of UNDOCUMENTED feature after all :).

(Tk does 10\,000s of probes - found a character XXXX\, have font with encoding YYYY\, can YYYY encode XXXX ? I hope to reduce that number by refining the code but it will still do a lot)

With current Encode I don't get to try any interesting fonts because it croaks when Tk asks iso-8859-1 if it can do the interesting character :-(

~!@​#$%^&*()_+ (My feeling expressed in octet stream :)

Right now we have​: check == 0\, fallback char (New and overdue - thanks!) check == -1\, perlqq \X{xxxx} style croak

Ah\, it does not croak. It FALLS BACK that way.

otherwise \N{U+XXXX} style croak

(Did \N{U+XXXX} get (back) in ? - I seem to recall it got removed once.)

Didn't touch that part.

You have established the principle of check values meaning something (which was always the plan).

Can I suggest though that we make it a bit mask - a stab at an initial set of bits : check == 0 - fallback (check & 3) == 1 - croak (check & 3) == 2 - warn (check & 3) == 3 - silent return (check & 4) - \x{xxxx} vs \N{U+XXXX} If you like make $string adjustment optional check & 8 - Update Don't bother to update $string.

Looks good to me. Maybe I should add constants for that. Maybe I would modify which bits means what\, however.

Thus check == 0 - fallbacks check == 1 - \N{U+XXXX} croak check == 2 - \x{XXXX} croak check == 3 - silent fail chack == 4 - Uninteresting check == 5 - \N{U+XXXX} warn check == 6 - \x{XXXX} warn check == 11 - silent fail with $string updated (What Tk wants)

Better schemes welcome.

What a good timing. I was about to release the next version. I'll take a shower\, implement them\, possible add test suits for them before the release.

Another alternative hinted at in old pods was passing check as an SV. Then if SV was a scalar ref\, then set $str to point at fail and return reason code in the scalar.

This one is very attractive but too attractive when code freeze is near. So let's go bit masks for the time being.

PS​:

To pick nits - Encode.xs's "layout" looks rather peculiar with perl source's default tab setting of 8 and expected indent of 4\, and many of files you have touched now have trailing whitespace on ends of lines.

I've noticed that. Trailing spaces must be due to patches after patches applied (When you paste directly that happens. That has already been fixed in the upcoming version (I applied "indent-buffer" in Emacs :).

Dan the Encode Maintainer

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

I am daydreaming that I am a caravan member\, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO​::encoding. You can get one as follows.

Whole​:   Encode   http​://www.dan.co.jp/~dankogai/Encode-1.50.tar.gz   and CPAN   PerlIO​::encoding   http​://www.dan.co.jp/~dankogai/PerlIO-encoding-0.02.tar.gz Diff   Encode   http​://www.dan.co.jp/~dankogai/current-1.50.diff.gz   PerlIO​::encoding   [ none ]

Diff is pretty big (> 3000 lines) so you should get a whole thing instead.

The biggest and the foremost change is the fallback API which is greatly enhanced. NI-XS request of

On Friday\, April 19\, 2002\, at 05​:01 \, Nick Ing-Simmons wrote​:

check == 11 - silent fail with $string updated (What Tk wants)

is implemented as FB_QUIET. see below;

==== Handling Malformed Data   THE CHECK argument is used as follows. When you omit it\,   it is identical to CHECK = 0.

  CHECK = Encode​::FB_DEFAULT ( == 0)   If CHECK is 0\, (en|de)code will put substitution char-   acter in place of the malformed character. for UCM-   based encodings\, \ will be used. For Unicode\,   \xFFFD is used. If the data is supposed to be UTF-8\,   an optional lexical warning (category utf8) is given.

  CHECK = Encode​::DIE_ON_ERROR (== 1)   If CHECK is 1\, methods will die immediately with an   error message. so when CHECK is set\, you should trap   the fatal error with eval{} unless you really want to   let it die on error.

  CHECK = Encode​::FB_QUIET   If CHECK is set to Encode​::FB_QUIET\, (en|de)code will   immediately return proccessed part on error\, with data   passed via argument overwritten with unproccessed   part. This is handy when have to repeatedly call   because the source data is chopped in the middle for   some reasons\, such as fixed-width buffer. Here is a   sample code that just does this.

  my $data = '';   while(defined(read $fh\, $buffer\, 256)){   # buffer may end in partial character so we append   $data .= $buffer;   $utf8 .= decode($encoding\, $data\, ENCODE​::FB_QUIET);   # $data now contains unprocessed partial character   }

  CHECK = Encode​::FB_WARN   This is the same as above\, except it warns on error.   Handy when you are debugging the mode above.

  perlqq mode (CHECK = Encode​::FB_PERLQQ)   For encodings that are implemented by Encode​::XS\,   CHECK == Encode​::FB_PERLQQ turns (en|de)code into   "perlqq" fallback mode.

  When you decode\, '\xXX' will be placed where XX is the   hex representation of the octet that could not be   decoded to utf8. And when you encode\, '\x{xxxx}' will   be placed where xxxx is the Unicode ID of the charac-   ter that cannot be found in the character repartoire   of the encoding.

  The bitmask   These modes are actually set via bitmask. here is how   FB_XX are laid out. for FB_XX you can import via "use   Encode qw(​:fallbacks)" for generic bitmask constants\,   you can import via   "use Encode qw(​:fallback_all)".

  FB_DEFAULT FB_CROAK FB_QUIET FB_WARN
FB_PERLQQ   DIE_ON_ERR 0x0001 X   WARN_ON_ER 0x0002 X   RETURN_ON_ERR 0x0004 X X   LEAVE_SRC 0x0008   PERLQQ 0x0100 X

  Unemplemented fallback schemes

  In future you will be able to use a code reference to a   callback function for the value of CHECK but its API is   still undecided.

Since PerlIO​::encoding was uncapable of using this new feature\, I have updated PerlIO​::encoding as well; Instead of pushing &PL_sv_yes to stack\, now struct PerlIOEncode has one more member\, chk\, that is initialized with Encode​::FB_QUIET.

typedef struct {   PerlIOBuf base; /* PerlIOBuf stuff */   SV *bufsv; /* buffer seen by layers above */   SV *dataSV; /* data we have read from layer below */   SV *enc; /* the encoding object */   SV *chk; /* CHECK in Encode methods */ } PerlIOEncode;

Encode now checks the version of PerlIO​::encoding and refuse to use an obsolete version. see t/perlio.t on details.

That way PerlIO​::encode has no trouble should Encode changes the value of FB_QUIET. As for the partial character problem\, I have found it is nearly impossible for escape-based encodings to support fixed-length buffer.
That is well-documented in Encode​::PerlIO\, a new pod that is added.

As a workaround a new method perlio_ok() is added. You can check if the encoding in question works well with PerlIO. I know that's not a perfect solution but good enough for 5.8.0. The ultimate solution to this problem is for PerlIO to implement line buffer FOR BOTH DIRECTIONS. But that's out and beyond my realm as yet....

And I have no positive or negative proof if the recent Encode works with djgpp. I need more decent environment. iMac with Virtual PC is okay but it is not exactly mine. I can't use it when my family is awake.
And I found my thinkpad's C​: is (Ugh!) Windoze Me so you can't really run vanilla dos. I want testers on this. Is Laszlo still working on djgpp? Error logs?

I am exhausted\, not just because of 1.50 but also because today happened to be the day when my government draws the tax from my bank account.
Isn't there something like Open Source Tax Deduction Program ?

Dan the Encode Maintainer

1.50 $Date​: 2002/04/19 06​:13​:02 $ ! ! Encode.pm Encode.xs Encode/encoding.h + t/fallback.pm   New Fallback API imlemented and documented. See "perldoc Encode"   for details ! lib/Encode/JP/JIS7.pm Encode.pm + lib/Encode/PerlIO.pod t/perlio.t   API compliance met. However\, it still does not work unless perlio   implements line buffer. See BUGS section in perldoc Encode​::PerlIO   As a sensible workaround\, perlio_ok() added to Encode. ! encoding.pm ! lib/Encode/Supported.pod   Doc fixes from jhi   Message-Id​: \20020418174647\.J8466@&#8203;alpha\.hut\.fi ! CN/CN.pm   Doc fixes from Autrijus   Message-Id​: \20020418144131\.GA10987@&#8203;not\.autrijus\.org ! Encode.pm   perlqq mode documented ! t/JP.t + t/jisx0201.euc t/jisx0201.ref ! t/jisx0208.euc t/jisx0208.ref   t/JP.t tests more rigorously and with other encodings   t/jisx0201.* added to test JIS7 encodings. jisx0208 is now PURELY   in jis0208 (used to contain jisx0201 part). ! Encode/Makefile_PL.e2x   The resulting Makefile.PL that "enc2xs -M" creates now auto-discovers   enc2xs and encode.h rather than hard-coded. This allows the resulting   module fully CPANizable. ! encoding.pm t/JP.t t/KR.t   PerlIO detection simplified (checks %INC instead of eval{}) ! Encode.xs Encode/encode.h + Unicode/Makefile.PL Unicode/Unicode.pm Unicode/Unicode.xs - lib/Encode/Unicode.pm   (en|de)code_xs relocated to where it belongs. Source reindented   to my taste ! bin/enc2xs   Additional (U8 *) cast added as suggested by jhi   Message-Id​: \20020417165916\.A28599@&#8203;alpha\.hut\.fi

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

I am daydreaming that I am a caravan member\, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO​::encoding. You can get one as follows.

p4 integrated to //depot/perlio for testing.

Without any changes to Tk804 things improved a bit - only the JP.t and KR.t tests were failing\, and those not failing as badly.

Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well.

Suggest one small tweak as in attached patch.

The patch turns off utf8_to_uvuni's warning and checks as only thing we are using the UV for is an error message (which in my case isn't going to be printed as I am in FB_QUIET). Otherwise I get noise when Tk is groping about in U+FFXX "page".

The "indent" looks better - but has "cuddled else" - no big deal.

I was a little surprised that Encode/encode.h gets installed in lib rather than archlib/CORE but can live with that (makes a kind of sense it is architecture neutral - but perl.h et. al. go elsewhere). The snag here is that Makefile.PL has added -I to find perl.h\, so I have to #include \<../../Encode/encode.h> which is portability issue as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL "configure" to hunt down encode.h.

Will do a spelling patch on the pod(s) when I get a chance.

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Inline Patch ```diff --- Encode.xs.ship Fri Apr 19 19:25:26 2002 +++ Encode.xs Fri Apr 19 19:27:59 2002 @@ -122,7 +122,7 @@ if (dir == enc->f_utf8) { STRLEN clen; UV ch = - utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), &clen, 0); + utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), &clen, UTF8_ALLOW_ANY|UTF8_CHECK_ONLY); if (check & ENCODE_DIE_ON_ERR) { Perl_croak( aTHX_ "\"\\N{U+%" UVxf "}\" does not map to %s, %d", ```
p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Saturday\, April 20\, 2002\, at 03​:45 \, Nick Ing-Simmons wrote​:

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

I am daydreaming that I am a caravan member\, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO​::encoding. You can get one as follows.

p4 integrated to //depot/perlio for testing.

Without any changes to Tk804 things improved a bit - only the JP.t and KR.t tests were failing\, and those not failing as badly.

I though I relocated perlio-related test in them to t/perlio.t. Is there any left?

Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well.

That was my biggest concern. So glad to hear that.

Suggest one small tweak as in attached patch.

The patch turns off utf8_to_uvuni's warning and checks as only thing we are using the UV for is an error message (which in my case isn't going to be printed as I am in FB_QUIET). Otherwise I get noise when Tk is groping about in U+FFXX "page".

Applied\, thanks.

The "indent" looks better - but has "cuddled else" - no big deal.

I was a little surprised that Encode/encode.h gets installed in lib rather than archlib/CORE but can live with that (makes a kind of sense it is architecture neutral - but perl.h et. al. go elsewhere). The snag here is that Makefile.PL has added -I to find perl.h\, so I have to #include \<../../Encode/encode.h> which is portability issue as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL "configure" to hunt down encode.h.

I wonder if there is more sensible way to install NON-PM files to PERL5LIB. For the time being it is at the mercy of MM. Though not a show stopper I would like Encode to be as clean and standard-compliant as possible. MM is so vast I don't even know how many more features are hidden...

Will do a spelling patch on the pod(s) when I get a chance.

Yes\, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm).

Dan the Encode Maintainer

p5pRT commented 22 years ago

From @nwc10

On Sat\, Apr 20\, 2002 at 04​:27​:15AM +0900\, Dan Kogai wrote​:

Yes\, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm).

You underestimate the power of the dark side.

M-x flyspell-mode

Definitely part of the dark side because here it defaults to American. And then refuses to start because I don't have American dictionaries installed. ispell has no problem "just running" and finding the correct dictionaries.

Nicholas Clark -- Even better than the real thing​: http​://nms-cgi.sourceforge.net/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

On Saturday\, April 20\, 2002\, at 03​:45 \, Nick Ing-Simmons wrote​:

p4 integrated to //depot/perlio for testing.

Without any changes to Tk804 things improved a bit - only the JP.t and KR.t tests were failing\, and those not failing as badly.

I though I relocated perlio-related test in them to t/perlio.t. Is there any left?

I meant Tk's JP.t and KR.t (Which _display_ copies of the same data files that Encode uses/used.)

I was a little surprised that Encode/encode.h gets installed in lib rather than archlib/CORE but can live with that (makes a kind of sense it is architecture neutral - but perl.h et. al. go elsewhere). The snag here is that Makefile.PL has added -I to find perl.h\, so I have to #include \<../../Encode/encode.h> which is portability issue as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL "configure" to hunt down encode.h.

I wonder if there is more sensible way to install NON-PM files to PERL5LIB. For the time being it is at the mercy of MM.

One can arrange to install things almost anywhere - with enough MM overrides.

I am _NOT suggesting this is right thing to do but Tk itself installs its .h files via​:

sub MY​::post_initialize { ... $dir = $self->catdir('$(INST_ARCHLIBDIR)'\,'pTk'); push(@​{$self->{'dir_targets'}}\,$dir); foreach $name (sort(@​{$self->{H}}\,keys %files))   {   $self->{PM}->{$name} = $self->catfile($dir\,$name);   } ... }

I have no idea (not having done a Tk install recently) if that still works with new MM.

Will do a spelling patch on the pod(s) when I get a chance.

Yes\, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm).

Hmm\, project for ptked - spell check in a POD-aware manner ;-)

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Saturday\, April 20\, 2002\, at 05​:38 \, Nicholas Clark wrote​:

On Sat\, Apr 20\, 2002 at 04​:27​:15AM +0900\, Dan Kogai wrote​:

Yes\, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm).

You underestimate the power of the dark side.

M-x flyspell-mode

I knew something like this existed but never checked the mode name :) Hmm.... Requires ispell... Piece of cake with portupgrade (could be the most widely used ruby program in (Free)BSD world).... Oh man! you're right! It even supports mouse (but I usually use emacs only via tty).
But how about perl jargons? "automagical"....Ni!
"barewords"....Ni!.... Hmm. This mode needs some more education :)
Thanks. More than 10 years w/ Emacs and still lost in modes....

Definitely part of the dark side because here it defaults to American.

Does it correct pronunciation of the Britons so "CAN'T do that" sounds less obscene :?

And then refuses to start because I don't have American dictionaries installed. ispell has no problem "just running" and finding the correct dictionaries.

Dan the Emacs User\, not Elisp Hacker   ^^^^^pretty funny. MacOS X Mail underline this but not   "Emacs". Is it smart enough to scan $PATH and make them   correct?

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

While (on the other machine) Tk804 seems happy-ish\, on this laptop I am attempting to catch up on my mail using bleadperl + UTF8-aware tkmail but non Unicode Tk800.

It seems to be discovering a snag.

When I try and read Autrijus Tang's latest posting to perl-unicode with that I get a malloc related segfault​:

Program received signal SIGSEGV\, Segmentation fault. 0x400e3f5a in chunk_realloc () from /lib/libc.so.6 (gdb) bt #0 0x400e3f5a in chunk_realloc () from /lib/libc.so.6 #1 0x400e3ed4 in realloc () from /lib/libc.so.6 #2 0x80a0c85 in Perl_safesysrealloc (where=0x8e2f000\, size=49) at util.c​:122 #3 0x80b8d13 in Perl_sv_grow (my_perl=0x8152a08\, sv=0x8e53d28\, newlen=49)   at sv.c​:1580 #4 0x405d0fc7 in encode_method (my_perl=0x8152a08\, enc=0x4078c47c\, dir=0x40756e20\,   src=0x8e4cc74\, check=0) at Encode.xs​:109 #5 0x405d1698 in XS_Encode__XS_decode (my_perl=0x8152a08\, cv=0x8473038)   at Encode.xs​:247 #6 0x80b65fe in Perl_pp_entersub (my_perl=0x8152a08) at pp_hot.c​:2734 #7 0x80a0743 in Perl_runops_debug (my_perl=0x8152a08) at dump.c​:1394 #8 0x8062f84 in S_call_body (my_perl=0x8152a08\, myop=0xbfffe97c\, is_eval=0)   at perl.c​:2022 #9

Note that in _this_ case use of tkmail is using fallbacks - so I suspect (no proof yet) that "new" fallback insert can buffer-overrun\, and so destroy the malloc data structures.

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Nick Ing-Simmons \nick@&#8203;ing\-simmons\.net writes​:

While (on the other machine) Tk804 seems happy-ish\, on this laptop I am attempting to catch up on my mail using bleadperl + UTF8-aware tkmail but non Unicode Tk800.

It seems to be discovering a snag.

When I try and read Autrijus Tang's latest posting to perl-unicode with that I get a malloc related segfault​:

Note that in _this_ case use of tkmail is using fallbacks - so I suspect (no proof yet) that "new" fallback insert can buffer-overrun\, and so destroy the malloc data structures.

This looks suspicious​:

Near line 181 of Encode.xs (tail of main while loop)​:

  /* settle variables when fallback */   dlen = SvCUR(dst);   d = (U8*)SvPVX(dst) + dlen;   s = (U8*)SvPVX(src) + sdone;   slen = tlen - sdone;   break;

When calling do_encode() at top of loop dlen is supposed to be number of bytes _available_ at d\, and you have it as number of bytes _used_. So if we have used most of it and then insert a fallback do_encode() will zoom off the end of the SV.

I think (too late here for morning person like me to think well) that

  d = (U8 *) SvEND(dst);   dlen = SvLEN(dst) - ddone - 1;

is closer to correct ..

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

By the time you read this you must have just awake.

On Saturday\, April 20\, 2002\, at 07​:07 \, Nick Ing-Simmons wrote​:

Near line 181 of Encode.xs (tail of main while loop)​:

    /\* settle variables when fallback \*/
    dlen = SvCUR\(dst\);
    d   = \(U8\*\)SvPVX\(dst\) \+ dlen;
    s   = \(U8\*\)SvPVX\(src\) \+ sdone;
    slen = tlen \- sdone;
    break;

When calling do_encode() at top of loop dlen is supposed to be number of bytes _available_ at d\, and you have it as number of bytes _used_. So if we have used most of it and then insert a fallback do_encode() will zoom off the end of the SV.

I think (too late here for morning person like me to think well) that

        d = \(U8 \*\) SvEND\(dst\);
        dlen = SvLEN\(dst\) \- ddone \- 1;

is closer to correct ..

I changed Encode.xs accordingly and t/fallback says it is okay. Please tell me if this will fix the problem you found. With this and Autrijus new ucm problem resolved\, I'll go ahead release 1.51.

Dan

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Fri\, 2002-04-19 at 17​:27\, Dan Kogai wrote​:

On Saturday\, April 20\, 2002\, at 05​:38 \, Nicholas Clark wrote​:

You underestimate the power of the dark side.

M-x flyspell-mode

I knew something like this existed but never checked the mode name :)

Question​: Does EMACS do "foo"? Answer​: M-x foo-mode

Invariably\, I find that that cool thing I wish EMACS did is buried in a weird package I've never heard of. Now\, if there were only M-x write-my-perl-for-me-mode\, I'd make key-binding out of that puppy! ;-)

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On 19 Apr 2002 17​:47​:28 -0400\, ajs@​perl.org (Aaron Sherman) wrote​:

Now\, if there were only M-x write-my-perl-for-me-mode\, I'd make key-binding out of that puppy! ;-)

Beware what you ask for\, for you might receive it... what if that particular mode were added by one of the infamous Perl code repository authors that newbies love to use ;)

'use strict'? what's that? And I *love* calling subroutines with '&' in front! And what's this "why to kay" problem you keep going on about?

Cheers\, Philip

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

Dan Kogai \dankogai@&#8203;dan\.co\.jp writes​:

When calling do_encode() at top of loop dlen is supposed to be number of bytes _available_ at d\, and you have it as number of bytes _used_. So if we have used most of it and then insert a fallback do_encode() will zoom off the end of the SV.

I think (too late here for morning person like me to think well) that

        d = \(U8 \*\) SvEND\(dst\);
        dlen = SvLEN\(dst\) \- ddone \- 1;

is closer to correct ..

I changed Encode.xs accordingly and t/fallback says it is okay. Please tell me if this will fix the problem you found.

With this I can read Autrijus's email that was dumping to completion. Also in light of day it seems to be the correct fix.

Dan -- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

I just checked in these changes to ext/Encode/... as change 16022 on perlio branch.

- switch to XSLoader - spelling & trailing whitespace removal. - remove a "use loop" (Encode loaded PerlIO​::encoding\, loaded Encode)   it never loops\, but such things cause problems for imports. - Changed how LEAVE_SRC was tested   x & ~y is not same as !(x & y) - Moved Unicode.xs towards supporting same check values. - Set @​Encode​::XS​::ISA to Encode​::Encoding - added ->needs_lines method with my best guess at which ones do.

I still cannot get TODO tests in t/perlio.t despite some work on PerlIO​::encoding to honour ->needs_lines. I need to study it some more. What I really want to do is get have PerlIO​::encoding use fallback schemes. Which ENCODE_FB_XXX flag bit(s) give me fallback characters but still remove translated stuff from the src buffer?

Perhaps "update src" should be an active rather than a passive bit?  

-- Nick Ing-Simmons http​://www.ni-s.u-net.com/

p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

==== //depot/perlio/ext/Encode/Encode.pm#64 - /home/p4work/perl/perlio/ext/Encode/Encode.pm ==== Index​: perlio/ext/Encode/Encode.pm

Inline Patch ```diff --- perlio/ext/Encode/Encode.pm.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/Encode.pm Sat Apr 20 20:36:47 2002 @@ -2,12 +2,12 @@ use strict; our $VERSION = do { my @r = (q$Revision: 1.50 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; our $DEBUG = 0; +use XSLoader (); +XSLoader::load 'Encode'; -require DynaLoader; require Exporter; +our @ISA = qw(Exporter); -our @ISA = qw(Exporter DynaLoader); - # Public, encouraged API is exported by default our @EXPORT = qw( @@ -19,7 +19,7 @@ our @FB_CONSTS = qw(FB_DEFAULT FB_QUIET FB_WARN FB_PERLQQ FB_CROAK); our @EXPORT_OK = - ( + ( qw( _utf8_off _utf8_on define_encoding from_to is_16bit is_8bit is_utf8 perlio_ok resolve_alias utf8_downgrade utf8_upgrade @@ -27,16 +27,13 @@ @FB_FLAGS, @FB_CONSTS, ); -our %EXPORT_TAGS = +our %EXPORT_TAGS = ( all => [ @EXPORT, @EXPORT_OK ], fallbacks => [ @FB_CONSTS ], fallback_all => [ @FB_CONSTS, @FB_FLAGS ], ); - -bootstrap Encode (); - # Documentation moved after __END__ for speed - NI-S use Carp; @@ -57,7 +54,7 @@ my @modules = (@_ and $_[0] eq ":all") ? values %ExtModule : @_; for my $mod (@modules){ $mod =~ s,::,/,g or $mod = "Encode/$mod"; - $mod .= '.pm'; + $mod .= '.pm'; $DEBUG and warn "about to require $mod;"; eval { require $mod; }; } @@ -193,7 +190,7 @@ # This is to restore %Encoding if really needed; # sub predefine_encodings{ - if ($ON_EBCDIC) { + if ($ON_EBCDIC) { # was in Encode::UTF_EBCDIC package Encode::UTF_EBCDIC; *name = sub{ shift->{'Name'} }; @@ -202,7 +199,7 @@ my ($obj,$str,$chk) = @_; my $res = ''; for (my $i = 0; $i < length($str); $i++) { - $res .= + $res .= chr(utf8::unicode_to_native(ord(substr($str,$i,1)))); } $_[1] = '' if $chk; @@ -212,15 +209,15 @@ my ($obj,$str,$chk) = @_; my $res = ''; for (my $i = 0; $i < length($str); $i++) { - $res .= + $res .= chr(utf8::native_to_unicode(ord(substr($str,$i,1)))); } $_[1] = '' if $chk; return $res; }; - $Encode::Encoding{Unicode} = + $Encode::Encoding{Unicode} = bless {Name => "UTF_EBCDIC"} => "Encode::UTF_EBCDIC"; - } else { + } else { # was in Encode::UTF_EBCDIC package Encode::Internal; *name = sub{ shift->{'Name'} }; @@ -232,7 +229,7 @@ return $str; }; *encode = \&decode; - $Encode::Encoding{Unicode} = + $Encode::Encoding{Unicode} = bless {Name => "Internal"} => "Encode::Internal"; } @@ -256,15 +253,14 @@ $_[1] = '' if $chk; return $octets; }; - $Encode::Encoding{utf8} = + $Encode::Encoding{utf8} = bless {Name => "utf8"} => "Encode::utf8"; } } require Encode::Encoding; +@Encode::XS::ISA = qw(Encode::Encoding); -eval qq{ use PerlIO::encoding 0.02 }; -# warn $@ if $@; 1; @@ -281,14 +277,14 @@ =head2 Table of Contents -Encode consists of a collection of modules which details are too big +Encode consists of a collection of modules which details are too big to fit in one document. This POD itself explains the top-level APIs -and general topics at a glance. For other topics and more details, +and general topics at a glance. For other topics and more details, see the PODs below; Name Description -------------------------------------------------------- - Encode::Alias Alias defintions to encodings + Encode::Alias Alias definitions to encodings Encode::Encoding Encode Implementation Base Class Encode::Supported List of Supported Encodings Encode::CN Simplified Chinese Encodings @@ -359,7 +355,7 @@ For CHECK see L. For example to convert (internally UTF-8 encoded) Unicode string to -iso-8859-1 (also known as Latin1), +iso-8859-1 (also known as Latin1), $octets = encode("iso-8859-1", $unicode); @@ -439,7 +435,7 @@ @ebcdic = Encode->encodings("EBCDIC"); -To find which encodings are supported by this package in details, +To find which encodings are supported by this package in details, see L. =head2 Defining Aliases @@ -462,7 +458,7 @@ Encode::resolve_alias("iso-8859-12") # false; nonexistent Encode::resolve_alias($name) eq $name # true if $name is canonical -This resolve_alias() does not need C and is +This resolve_alias() does not need C and is exported via C. See L on details. @@ -481,7 +477,7 @@ # via from_to open my $in, $infile or die; open my $out, $outfile or die; - while(<>){ + while(<>){ from_to($_, "shiftjis", "euc", 1); } @@ -508,7 +504,7 @@ place of the malformed character. for UCM-based encodings, EsubcharE will be used. For Unicode, \xFFFD is used. If the data is supposed to be UTF-8, an optional lexical warning (category -utf8) is given. +utf8) is given. =item I = Encode::DIE_ON_ERROR (== 1) @@ -519,10 +515,10 @@ =item I = Encode::FB_QUIET If I is set to Encode::FB_QUIET, (en|de)code will immediately -return proccessed part on error, with data passed via argument -overwritten with unproccessed part. This is handy when have to +return processed part on error, with data passed via argument +overwritten with unprocessed part. This is handy when have to repeatedly call because the source data is chopped in the middle for -some reasons, such as fixed-width buffer. Here is a sample code that +some reasons, such as fixed-width buffer. Here is a sample code that just does this. my $data = ''; @@ -547,7 +543,7 @@ representation of the octet that could not be decoded to utf8. And when you encode, '\x{I}' will be placed where I is the Unicode ID of the character that cannot be found in the character -repartoire of the encoding. +repertoire of the encoding. =item The bitmask @@ -616,12 +612,12 @@ L, L, -L, +L, L, -L, -L, -L, -L, +L, +L, +L, +L, the Perl Unicode Mailing List Eperl-unicode@perl.orgE =head1 MAINTAINER ```

==== //depot/perlio/ext/Encode/Encode.xs#65 - /home/p4work/perl/perlio/ext/Encode/Encode.xs ==== Index​: perlio/ext/Encode/Encode.xs

Inline Patch ```diff --- perlio/ext/Encode/Encode.xs.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/Encode.xs Sat Apr 20 20:36:47 2002 @@ -193,8 +193,8 @@ } } ENCODE_SET_SRC: - if (check & ~ENCODE_LEAVE_SRC){ - sdone = SvCUR(src) - (slen+sdone); + if (check && !(check & ENCODE_LEAVE_SRC)){ + sdone = SvCUR(src) - (slen+sdone); if (sdone) { sv_setpvn(src, (char*)s+slen, sdone); } ```

==== //depot/perlio/ext/Encode/Unicode/Unicode.xs#1 - /home/p4work/perl/perlio/ext/Encode/Unicode/Unicode.xs ==== Index​: perlio/ext/Encode/Unicode/Unicode.xs

Inline Patch ```diff --- perlio/ext/Encode/Unicode/Unicode.xs.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/Unicode/Unicode.xs Sat Apr 20 20:36:47 2002 @@ -6,6 +6,8 @@ #include "EXTERN.h" #include "perl.h" #include "XSUB.h" +#define U8 U8 +#include "../Encode/encode.h" #define FBCHAR 0xFFFd #define BOM_BE 0xFeFF @@ -80,11 +82,13 @@ MODULE = Encode::Unicode PACKAGE = Encode::Unicode +PROTOTYPES: DISABLE + void -decode_xs(obj, str, chk = &PL_sv_undef) +decode_xs(obj, str, check = 0) SV * obj SV * str -SV * chk +IV check CODE: { int size = SvIV(*hv_fetch((HV *)SvRV(obj),"size",4,0)); @@ -124,14 +128,14 @@ U8 *d; if (size != 4 && invalid_ucs2(ord)) { if (ucs2) { - if (SvTRUE(chk)) { + if (check) { croak("%s:no surrogates allowed %"UVxf, SvPV_nolen(*hv_fetch((HV *)SvRV(obj),"Name",4,0)), ord); } if (s+size <= e) { /* skip the next one as well */ - enc_unpack(aTHX_ &s,e,size,endian); + enc_unpack(aTHX_ &s,e,size,endian); } ord = FBCHAR; } @@ -160,10 +164,12 @@ d = uvuni_to_utf8_flags(d+SvCUR(result), ord, 0); SvCUR_set(result,d - (U8 *)SvPVX(result)); } - if (SvTRUE(chk)) { - if (s < e) { + if (s < e) { Perl_warner(aTHX_ packWARN(WARN_UTF8),"%s:Partial character", SvPV_nolen(*hv_fetch((HV *)SvRV(obj),"Name",4,0))); + } + if (check && !(check & ENCODE_LEAVE_SRC)){ + if (s < e) { Move(s,SvPVX(str),e-s,U8); SvCUR_set(str,(e-s)); } @@ -176,10 +182,10 @@ } void -encode_xs(obj, utf8, chk = &PL_sv_undef) - SV * obj +encode_xs(obj, utf8, check = 0) +SV * obj SV * utf8 -SV * chk +IV check CODE: { int size = SvIV(*hv_fetch((HV *)SvRV(obj),"size",4,0)); @@ -205,7 +211,7 @@ if (size != 4 && invalid_ucs2(ord)) { if (!issurrogate(ord)){ if (ucs2) { - if (SvTRUE(chk)) { + if (check) { croak("%s:code point \"\\x{"UVxf"}\" too high", SvPV_nolen( *hv_fetch((HV *)SvRV(obj),"Name",4,0)) @@ -228,10 +234,12 @@ enc_pack(aTHX_ result,size,endian,ord); } } - if (SvTRUE(chk)) { + if (s < e) { + Perl_warner(aTHX_ packWARN(WARN_UTF8),"%s:Partial character", + SvPV_nolen(*hv_fetch((HV *)SvRV(obj),"Name",4,0))); + } + if (check && !(check & ENCODE_LEAVE_SRC)){ if (s < e) { - Perl_warner(aTHX_ packWARN(WARN_UTF8),"%s:Partial character", - SvPV_nolen(*hv_fetch((HV *)SvRV(obj),"Name",4,0))); Move(s,SvPVX(utf8),e-s,U8); SvCUR_set(utf8,(e-s)); } ```

==== //depot/perlio/ext/Encode/lib/Encode/Encoding.pm#8 - /home/p4work/perl/perlio/ext/Encode/lib/Encode/Encoding.pm ==== Index​: perlio/ext/Encode/lib/Encode/Encoding.pm

Inline Patch ```diff --- perlio/ext/Encode/lib/Encode/Encoding.pm.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/lib/Encode/Encoding.pm Sat Apr 20 20:36:47 2002 @@ -20,6 +20,8 @@ sub new_sequence { return $_[0] } +sub needs_lines { 0 } + sub DESTROY {} 1; ```

==== //depot/perlio/ext/Encode/lib/Encode/JP/JIS7.pm#2 - /home/p4work/perl/perlio/ext/Encode/lib/Encode/JP/JIS7.pm ==== Index​: perlio/ext/Encode/lib/Encode/JP/JIS7.pm

Inline Patch ```diff --- perlio/ext/Encode/lib/Encode/JP/JIS7.pm.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/lib/Encode/JP/JIS7.pm Sat Apr 20 20:36:47 2002 @@ -7,8 +7,8 @@ for my $name ('7bit-jis', 'iso-2022-jp', 'iso-2022-jp-1'){ my $h2z = ($name eq '7bit-jis') ? 0 : 1; my $jis0212 = ($name eq 'iso-2022-jp') ? 0 : 1; - - $Encode::Encoding{$name} = + + $Encode::Encoding{$name} = bless { Name => $name, h2z => $h2z, @@ -17,7 +17,10 @@ } sub name { shift->{'Name'} } -sub new_sequence { $_[0] }; + +sub new_sequence { $_[0] } + +sub needs_lines { 1 } use Encode::CJKConstants qw(:all); @@ -87,7 +90,7 @@ ((?:$RE{EUC_C})+|(?:$RE{EUC_KANA})+|(?:$RE{EUC_0212})+) }{ my $chunk = $1; - my $esc = + my $esc = ( $chunk =~ tr/\x8E//d ) ? $ESC{KANA} : ( $chunk =~ tr/\x8F//d ) ? $ESC{JIS_0212} : $ESC{JIS_0208}; ```

==== //depot/perlio/ext/Encode/lib/Encode/KR/2022_KR.pm#3 - /home/p4work/perl/perlio/ext/Encode/lib/Encode/KR/2022_KR.pm ==== Index​: perlio/ext/Encode/lib/Encode/KR/2022_KR.pm

Inline Patch ```diff --- perlio/ext/Encode/lib/Encode/KR/2022_KR.pm.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/lib/Encode/KR/2022_KR.pm Sat Apr 20 20:36:47 2002 @@ -13,6 +13,8 @@ sub name { return $_[0]->{name}; } +sub needs_lines { 1 } + sub decode { my ($obj,$str,$chk) = @_; @@ -35,14 +37,14 @@ sub iso_euc{ my $r_str = shift; - $$r_str =~ s/$RE{'2022_KR'}//gox; # remove the designator + $$r_str =~ s/$RE{'2022_KR'}//gox; # remove the designator $$r_str =~ s{ # replace chars. in GL \x0e # between SO(\x0e) and SI(\x0f) ([^\x0f]*) # with chars. in GR \x0f } { - my $out= $1; + my $out= $1; $out =~ tr/\x21-\x7e/\xa1-\xfe/; $out; }geox; @@ -51,7 +53,7 @@ sub euc_iso{ my $r_str = shift; - substr($$r_str,0,0)=$ESC{'2022_KR'}; # put the designator at the beg. + substr($$r_str,0,0)=$ESC{'2022_KR'}; # put the designator at the beg. $$r_str =~ s{ # move KS X 1001 chars. in GR to GL ($RE{EUC_C}+) # and enclose them with SO and SI }{ ```

==== //depot/perlio/ext/Encode/t/perlio.t#1 - /home/p4work/perl/perlio/ext/Encode/t/perlio.t ==== Index​: perlio/ext/Encode/t/perlio.t

Inline Patch ```diff --- perlio/ext/Encode/t/perlio.t.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode/t/perlio.t Sat Apr 20 20:36:47 2002 @@ -13,7 +13,8 @@ exit 0; } require Encode; - unless ($INC{"PerlIO/encoding.pm"} + eval { require PerlIO::encoding }; + unless ($INC{"PerlIO/encoding.pm"} and PerlIO::encoding->VERSION >= 0.02 ){ print "1..0 # Skip:: PerlIO::encoding 0.02 or better required\n"; @@ -95,7 +96,7 @@ } close $fh; ok($utext eq $dtext, "<:encoding($e); line-by-line"); - } + } $DEBUG or unlink ($sfile, $pfile); } End of Patch. ```
p5pRT commented 22 years ago

From [Unknown Contact. See original ticket]

On Sunday\, April 21\, 2002\, at 04​:50 \, Nick Ing-Simmons wrote​:

I just checked in these changes to ext/Encode/... as change 16022 on perlio branch.

To honor whitespaces\, I usually rsync perl-core first then copy filesback to my repository for NI-XS (this works only for patches from those w/ commit right to perl repository\, however). But seems like AS is not new enough so...

- switch to XSLoader - spelling & trailing whitespace removal. - remove a "use loop" (Encode loaded PerlIO​::encoding\, loaded Encode) it never loops\, but such things cause problems for imports. - Changed how LEAVE_SRC was tested x & ~y is not same as !(x & y) - Moved Unicode.xs towards supporting same check values. - Set @​Encode​::XS​::ISA to Encode​::Encoding - added ->needs_lines method with my best guess at which ones do.

I did this;

* Copy the patch chunk * perl -i.bak 's/\s+\n/\n/o' patch.file to make sure no trailing space after LF * patch -l so patch ignores the number of whitespaces ahead

And the resulting patch work pretty good. Among 18 hunks one failed at Encode.pm and that was trivial to mend manually. and "make distclean -> breadperl Makefile.PL -> make test" works beautifully.

I still cannot get TODO tests in t/perlio.t despite some work on PerlIO​::encoding to honour ->needs_lines. I need to study it some more. What I really want to do is get have PerlIO​::encoding use fallback schemes. Which ENCODE_FB_XXX flag bit(s) give me fallback characters but still remove translated stuff from the src buffer?

Perhaps "update src" should be an active rather than a passive bit?

Please wait till caffain runs on my bloodstream. I just woke up (because of insomnia or whatever I was not quite nocturnal last night; It is 5 minutes before 06​:00 AM JST).

Dan the Encode Maintainer