Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.92k stars 549 forks source link

Document deprecation of sysread on :utf8 handles #16544

Closed p5pRT closed 5 years ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#133170 (status was 'rejected')

Searchable as RT133170$

p5pRT commented 6 years ago

From @jimav

This is a bug report for perl from jim.avera@​gmail.com\, generated with the help of perlbug 1.40 running under perl 5.26.1.


`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

  Note that if the filehandle has been marked as "​:utf8"\, Unicode   characters are read instead of bytes (the LENGTH\, OFFSET\, and the   return value of "sysread" are in Unicode characters). The   "​:encoding(...)" layer implicitly introduces the "​:utf8" layer.   See "binmode"\, "open"\, and the open pragma.

However doing so provikes this at run time​:

  sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30

Suggest changing the documentation to say that this feature is deprecated\, so people don't waste time writing code which will become wrong later.



Flags​:   category=core   severity=low


Site configuration information for perl 5.26.1​:

Configured by Ubuntu at Sat Mar 10 18​:40​:42 UTC 2018.

Summary of my perl5 (revision 5 version 26 subversion 1) configuration​:  
  Platform​:   osname=linux   osvers=4.9.0   archname=x86_64-linux-gnu-thread-multi   uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux '   config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-5CtO_8/perl-5.26.1=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl\,-Bsymbolic-functions -Wl\,-z\,relro -Dlddlflags=-shared -Wl\,-Bsymbolic-functions -Wl\,-z\,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.26 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.26 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.26 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.26.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.26.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.26.1'   hint=recommended   useposix=true   d_sigaction=define   useithreads=define   usemultiplicity=define   use64bitint=define   use64bitall=define   uselongdouble=undef   usemymalloc=n   default_inc_excludes_dot=define   bincompat5005=undef   Compiler​:   cc='x86_64-linux-gnu-gcc'   ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'   optimize='-O2 -g'   cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'   ccversion=''   gccversion='7.3.0'   gccosandvers=''   intsize=4   longsize=8   ptrsize=8   doublesize=8   byteorder=12345678   doublekind=3   d_longlong=define   longlongsize=8   d_longdbl=define   longdblsize=16   longdblkind=3   ivtype='long'   ivsize=8   nvtype='double'   nvsize=8   Off_t='off_t'   lseeksize=8   alignbytes=8   prototype=define   Linker and Libraries​:   ld='x86_64-linux-gnu-gcc'   ldflags =' -fstack-protector-strong -L/usr/local/lib'   libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib   libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt   perllibs=-ldl -lm -lpthread -lc -lcrypt   libc=libc-2.27.so   so=so   useshrplib=true   libperl=libperl.so.5.26   gnulibc_version='2.27'   Dynamic Linking​:   dlsrc=dl_dlopen.xs   dlext=so   d_dlsymun=undef   ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'   lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'

Locally applied patches​:   DEBPKG​:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.   DEBPKG​:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.   DEBPKG​:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.   DEBPKG​:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @​INC directories.   DEBPKG​:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.   DEBPKG​:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking   DEBPKG​:fixes/respect_umask - Respect umask during installation   DEBPKG​:debian/writable_site_dirs - Set umask approproately for site install directories   DEBPKG​:debian/extutils_set_libperl_path - EU​:MM​: set location of libperl.a under /usr/lib   DEBPKG​:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor   DEBPKG​:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.   DEBPKG​:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.   DEBPKG​:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.   DEBPKG​:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.   DEBPKG​:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local   DEBPKG​:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules   DEBPKG​:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts   DEBPKG​:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.26.1-6 in patchlevel.h   DEBPKG​:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}   DEBPKG​:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN​::Distribution with correct name of html2text   DEBPKG​:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl   DEBPKG​:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN​::FirstTime defaults with nonexisting site dirs if a parent is writable   DEBPKG​:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize​::Storable​: respect 'nstore' option not respected   DEBPKG​:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories   DEBPKG​:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU​::MakeMaker honour MANnEXT settings in generated manpage headers   DEBPKG​:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798   DEBPKG​:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub   DEBPKG​:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize   DEBPKG​:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd   DEBPKG​:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math​::Trig​: clarify definition of great_circle_midpoint   DEBPKG​:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math​::Trig​: add missing SEE ALSO   DEBPKG​:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math​::Trig​: document angle units   DEBPKG​:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN​: Add link to main CPAN web site   DEBPKG​:fixes/time_piece_doc - https://bugs.debian.org/817925 Time​::Piece​: Improve documentation for add_months and add_years   DEBPKG​:fixes/extutils_makemaker_reproducible - https​://bugs.debian.org/835815 https://bugs.debian.org/834190 Make perllocal.pod files reproducible   DEBPKG​:fixes/file_path_hurd_errno - File-Path​: Fix test failure in Hurd due to hard-coded ENOENT   DEBPKG​:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems   DEBPKG​:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters   DEBPKG​:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack.   DEBPKG​:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294)   DEBPKG​:fixes/getopt-long-2 - [rt.cpan.org #120300] Withdraw part of commit 5d9947fb445327c7299d8beb009d609bc70066c0\, which tries to implement more GNU getopt_long campatibility. GNU   DEBPKG​:fixes/getopt-long-3 - provide a default value for optional arguments   DEBPKG​:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068.   DEBPKG​:fixes/test-builder-reset - https://bugs.debian.org/865894 Reset inside subtest maintains parent   DEBPKG​:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa   DEBPKG​:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4   DEBPKG​:fixes/json-pp-example - [rt.cpan.org #92793] https://bugs.debian.org/871837 fix RT-92793​: bug in SYNOPSIS   DEBPKG​:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less   DEBPKG​:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.   DEBPKG​:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes   DEBPKG​:fixes/rename-filexp.U-phase1 - regen-configure​: rename filexp.U to filexp_path.U\, phase 1   DEBPKG​:fixes/rename-filexp.U-phase2 - regen-configure​: rename filexp.U to filexp_path.U\, phase 2   DEBPKG​:fixes/packaging_test_skips - Skip various tests if PERL_BUILD_PACKAGING is set   DEBPKG​:debian/mod_paths - Tweak @​INC ordering for Debian   DEBPKG​:fixes/encode-alias-regexp - https​://bugs.debian.org/880085 fix https://github.com/dankogai/p5-encode/issues/127   DEBPKG​:fixes/regex-memory-leak - [910a6a8] https://bugs.debian.org/891196 [perl #132892] perl #132892​: avoid leak by mortalizing temporary copy of pattern   DEBPKG​:fixes/CVE-2018-6797 - [perl #132227] (perl #132227) restart a node if we change to uni rules within the node and encounter a sharp S   DEBPKG​:fixes/CVE-2018-6798/pt1 - [perl #132063] Heap buffer overflow   DEBPKG​:fixes/CVE-2018-6798/pt2 - [perl #132063] 5.26.1​: fix TRIE_READ_CHAR and DECL_TRIE_TYPE to account for non-utf8 target   DEBPKG​:fixes/CVE-2018-6798/pt3 - [perl #132063] (perl #132063) we should no longer warn for this code   DEBPKG​:fixes/CVE-2018-6798/pt4 - [perl #132063] utf8.c​: Don't dump malformation past first NUL   DEBPKG​:fixes/CVE-2018-6913 - [perl #131844] (perl #131844) fix various space calculation issues in pp_pack.c


@​INC for perl 5.26.1​:   /home/jima/lib/perl   /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi   /home/jima/perl5/lib/perl5/5.26.1/x86_64-linux-gnu-thread-multi   /home/jima/perl5/lib/perl5/5.26.1   /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi   /home/jima/perl5/lib/perl5   /etc/perl   /usr/local/lib/x86_64-linux-gnu/perl/5.26.1   /usr/local/share/perl/5.26.1   /usr/lib/x86_64-linux-gnu/perl5/5.26   /usr/share/perl5   /usr/lib/x86_64-linux-gnu/perl/5.26   /usr/share/perl/5.26   /home/jima/perl5/lib/perl5/5.26.0   /home/jima/perl5/lib/perl5/5.26.0/x86_64-linux-gnu-thread-multi   /usr/local/lib/site_perl   /usr/lib/x86_64-linux-gnu/perl-base


Environment for perl 5.26.1​:   HOME=/home/jima   LANG=en_US.UTF-8   LANGUAGE (unset)   LC_COLLATE=C   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/home/jima/.local/bin​:/home/jima/perl5/bin​:/bin​:/home/jima/bin​:/home/jima/jima_tools/x86_64/bin​:/home/jima/jima_tools/bin​:/usr/bin​:/usr/sbin​:/sbin​:/usr/bin/X11​:/usr/local/bin​:/usr/local/sbin​:/usr/games​:/usr/local/games​:/snap/bin​:/usr/lib/jvm/java-8-oracle/bin​:/usr/lib/jvm/java-8-oracle/db/bin​:/usr/lib/jvm/java-8-oracle/jre/bin​:.   PERL5LIB=/home/jima/lib/perl​:/home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi​:/home/jima/perl5/lib/perl5   PERL_BADLANG (unset)   PERL_LOCAL_LIB_ROOT=/home/jima/perl5   PERL_MB_OPT=--install_base /home/jima/perl5   PERL_MM_OPT=INSTALL_BASE=/home/jima/perl5   SHELL=/bin/bash

p5pRT commented 6 years ago

From @Leont

On Wed\, May 2\, 2018 at 9​:39 PM\, Jim Avera (via RT) \perlbug\-followup@​perl\.org wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "​:utf8"\, Unicode
characters are read instead of bytes \(the LENGTH\, OFFSET\, and the
return value of "sysread" are in Unicode characters\)\. The
"​:encoding\(\.\.\.\)" layer implicitly introduces the "​:utf8" layer\.
See "binmode"\, "open"\, and the open pragma\.

However doing so provikes this at run time​:

sysread\(\) is deprecated on :utf8 handles\. This will be a fatal error in Perl 5\.30

Suggest changing the documentation to say that this feature is deprecated\, so people don't waste time writing code which will become wrong later.

Indeed this should be modified.

Leon

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 6 years ago

From @karenetheridge

IMO this should be considered a blocker for 5.28\, as it is a documentation issue for a change in this release.

On Wed\, May 2\, 2018 at 12​:46 PM\, Leon Timmermans \fawaka@​gmail\.com wrote​:

On Wed\, May 2\, 2018 at 9​:39 PM\, Jim Avera (via RT) \perlbug\-followup@​perl\.org wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "​:utf8"\, Unicode
characters are read instead of bytes \(the LENGTH\, OFFSET\, and the
return value of "sysread" are in Unicode characters\)\. The
"​:encoding\(\.\.\.\)" layer implicitly introduces the "​:utf8" layer\.
See "binmode"\, "open"\, and the open pragma\.

However doing so provikes this at run time​:

sysread\(\) is deprecated on :utf8 handles\. This will be a fatal error

in Perl 5.30

Suggest changing the documentation to say that this feature is deprecated\, so people don't waste time writing code which will become wrong later.

Indeed this should be modified.

Leon

p5pRT commented 6 years ago

From @tonycoz

On Wed\, 02 May 2018 12​:39​:47 -0700\, jim.avera@​gmail.com wrote​:

`perlfunc -f sysread` says using :utf8 handles are perfectly okay​:

Note that if the filehandle has been marked as "​:utf8"\, Unicode characters are read instead of bytes (the LENGTH\, OFFSET\, and the return value of "sysread" are in Unicode characters). The "​:encoding(...)" layer implicitly introduces the "​:utf8" layer. See "binmode"\, "open"\, and the open pragma.

However doing so provikes this at run time​:

sysread() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30

Suggest changing the documentation to say that this feature is deprecated\, so people don't waste time writing code which will become wrong later.

How about the attached?

Tony

p5pRT commented 6 years ago

From @tonycoz

0001-perl-133170-document-deprecation-of-sysread-syswrite.patch ```diff From d338352c918eae0919f56da288492ef9ac23f63a Mon Sep 17 00:00:00 2001 From: Tony Cook Date: Thu, 3 May 2018 14:19:21 +1000 Subject: (perl #133170) document deprecation of sysread/syswrite/send/recv on :utf8 well, UTF8 flagged handles... --- pod/perlfunc.pod | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index fa08d4c3e9..170ae4f4e0 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -6281,6 +6281,10 @@ string otherwise. If there's an error, returns the undefined value. This call is actually implemented in terms of the L system call. See L for examples. +Note that using C on a socket that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + Note the I: depending on the status of the socket, either (8-bit) bytes or characters are received. By default all sockets operate on bytes, but for example if the socket has been changed using @@ -6288,7 +6292,9 @@ L|/binmode FILEHANDLE, LAYER> to operate with the C<:encoding(UTF-8)> I/O layer (see the L pragma), the I/O will operate on UTF8-encoded Unicode characters, not bytes. Similarly for the C<:encoding> layer: in that -case pretty much any characters can be read. +case pretty much any characters can be read. No validation is +performed on the UTF-8, since any layers that perform such validation +are bypassed by C. =item redo LABEL X @@ -7080,6 +7086,10 @@ case it does a L syscall. Returns the number of characters sent, or the undefined value on error. The L syscall is currently unimplemented. See L for examples. +Note that using C on a socket that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + Note the I: depending on the status of the socket, either (8-bit) bytes or characters are sent. By default all sockets operate on bytes, but for example if the socket has been changed using @@ -8720,13 +8730,25 @@ L|/eof FILEHANDLE> doesn't work well on device files (like ttys) anyway. Use L|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and check for a return value for 0 to decide whether you're done. -Note that if the filehandle has been marked as C<:utf8>, Unicode -characters are read instead of bytes (the LENGTH, OFFSET, and the -return value of L|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> -are in Unicode characters). The C<:encoding(...)> layer implicitly -introduces the C<:utf8> layer. See -L|/binmode FILEHANDLE, LAYER>, -L|/open FILEHANDLE,EXPR>, and the L pragma. +Note that using C on a file that has been marked as C<:utf8> +is deprecated, and will result in an exception in future versions of +perl. + +If the filehandle has been marked as C<:utf8>, Unicode characters +assumed to be UTF-8 encoded are read instead of bytes (the LENGTH, +OFFSET, and the return value of L|/sysread +FILEHANDLE,SCALAR,LENGTH,OFFSET> are in Unicode characters). + +Note that UTF-8 encoded Unicode is read by C even if the the +C<:utf8> mark is introduced by a C<:encoding()> that isn't C, +nor is the UTF-8 validated. Any other layers are also ignored, so if +you've pushed layers to decompress your input and decode the result as +C, C will treat your compressed UTF-16 data as +C. + +The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. See +L|/binmode FILEHANDLE, LAYER>, L|/open +FILEHANDLE,EXPR>, and the L pragma. =item sysseek FILEHANDLE,POSITION,WHENCE X X @@ -8888,6 +8910,10 @@ B: If the filehandle is marked C<:utf8>, Unicode characters encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and return value of L|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET> are in (UTF8-encoded Unicode) characters. + +C on a filehandle marked C<:utf8> is deprecated, and will +raise an exception in a future version of perl. + The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. Alternately, if the handle is not marked with an encoding but you attempt to write characters with code points over 255, raises an exception. -- 2.11.0 ```
p5pRT commented 6 years ago

From @jimav

On 5/2/18 9​:21 PM\, Tony Cook via RT wrote | How about the attached?

Hi Tony\,

Is it specifically :utf8 which will not be allowed\, i.e.\, other layers might still be allowed on a sysread file handle in v5.30?  I didn't understand the new text which discussed interactions between the :utf8 layer and other layers such as :utf16.

Does it all boil down to requiring that the file handle read raw binary octets (e.g. after binmode($fh) is called)?   If so it might be better to just say the file handle must be in :raw mode rather than mention any _specific_ encoding such as utf8.

-Jim

p5pRT commented 6 years ago

From @tonycoz

On Wed\, 02 May 2018 23​:40​:58 -0700\, jim.avera@​gmail.com wrote​:

On 5/2/18 9​:21 PM\, Tony Cook via RT wrote | How about the attached?

Hi Tony\,

Is it specifically :utf8 which will not be allowed\, i.e.\, other layers might still be allowed on a sysread file handle in v5.30?  I didn't understand the new text which discussed interactions between the :utf8 layer and other layers such as :utf16.

Does it all boil down to requiring that the file handle read raw binary octets (e.g. after binmode($fh) is called)?   If so it might be better to just say the file handle must be in :raw mode rather than mention any _specific_ encoding such as utf8.

The problem isn't all layers.

The problem is specifically the way sysread etc handle layers that have the PERLIO_K_UTF8 flag set on them.

This includes the :utf8 layer (which is currently not a real layer) and :encoding() (as the sysread documentation mentions) and a hypothetical :utf16 layer would also set it\, assuming it's intended to decode utf-16 characters into perl's internal extended UTF-8 so perl can deal with it as characters.

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Tony

p5pRT commented 6 years ago

From @jimav

On 5/3/18 2​:40 AM\, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh->read() in those cases?

If buffering is used\, then​: If the underlying device is seekable\, left-over octets in the hidden buffer should be discarded and a seek done so they will be re-read later; that would protect coherency if other cooperating processes might randomly update the file.

If the underlying source is not seekable\, then left-over octets would have to stay in the hidden buffer\, but that's okay because there is no way for those bytes to mutate before they are called for by the application.  Note that for a tty in canonical mode\, the OS will only return one line at a time at least on *nix.

Just some uninformed ideas...

-Jim

p5pRT commented 6 years ago

From @jimav

On 5/3/18 4​:40 PM\, Jim Avera wrote​:

What if sysread just called fh->read() in those cases?

In essence\, my proposal is to make sysread() an synonym for fh->read() with the exception that if the underlying source is seekable\, then any left-over octets (not needed to satisfy LENGTH characters) would be discarded after each call and a seek done to re-read them later; and\, that buffering will be entirely skipped if there is no data-transforming layer on the file descriptor.

Happily\, :encoding(utf8) is not data-transforming because that is perl's internal representation so the octets can simply be put into the user's buffer and the utf8 flag set.

Even transforming decoders might often avoid left-over octets (and thus avoid the seek-back) by predicting the number of octets needed in common cases. For example\, a UTF-16 decoder could read LENGTH*2 octets and that would suffice if the codepoints happened to be ascii.   More realistically a ISO-8859-1 decoder could guess LENGTH*1 and often be right.  In other words\, seeking-back might not be a big performance hit in practice.  And any really perf-sensitive app shouldn't be using layers at all\, but should sysread() a raw file handle and do its own decoding.

-Jim

p5pRT commented 6 years ago

From @Leont

On Fri\, May 4\, 2018 at 1​:40 AM\, Jim Avera \jim\.avera@&#8203;gmail\.com wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?

If you want that\, why wouldn't you just use read?

Leon

p5pRT commented 6 years ago

From @jimav

On 5/3/18 5​:05 PM\, Leon Timmermans wrote​:

On Fri\, May 4\, 2018 at 1​:40 AM\, Jim Avera \jim\.avera@&#8203;gmail\.com wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise? If you want that\, why wouldn't you just use read?

Leon

Yes\, but I gather there is all this complexity (desired by someone) to allow certain layers to work with sysread(). Personally I would be happy if sysread simply disallowed any layers\, i.e. required a raw file handle.

On the other hand\, if the app wants Unicode characters\, it is convenient that perl's internal rep is utf8\, so reading from a fh with :encoding(utf8) should be possible with no actual extra overhead (just setting the utf8 flag on the user's buffer). Disallowing that one case seems strange from a user perspective.

-Ko\,

p5pRT commented 6 years ago

From @grinnz

On Thu\, May 3\, 2018 at 8​:14 PM\, Jim Avera \jim\.avera@&#8203;gmail\.com wrote​:

On 5/3/18 5​:05 PM\, Leon Timmermans wrote​:

On Fri\, May 4\, 2018 at 1​:40 AM\, Jim Avera \jim\.avera@&#8203;gmail\.com wrote​:

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?

If you want that\, why wouldn't you just use read?

Leon

Yes\, but I gather there is all this complexity (desired by someone) to allow certain layers to work with sysread(). Personally I would be happy if sysread simply disallowed any layers\, i.e. required a raw file handle.

On the other hand\, if the app wants Unicode characters\, it is convenient that perl's internal rep is utf8\, so reading from a fh with :encoding(utf8) should be possible with no actual extra overhead (just setting the utf8 flag on the user's buffer). Disallowing that one case seems strange from a user perspective.

-Ko\,

From a user perspective\, the utf8 flag should be irrelevant\, and the non-strict :utf8 or :encoding(utf8) layers shouldn't be used.

-Dan

p5pRT commented 6 years ago

From @tonycoz

On Thu\, 03 May 2018 16​:41​:07 -0700\, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM\, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh->read() in those cases?

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

p5pRT commented 6 years ago

From @jimav

On 5/6/18 4​:35 PM\, Tony Cook via RT wrote​:

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them. ... This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

That seems to be some kind of secret or protected ticket!

  RT Error   No permission to display that ticket No details

p5pRT commented 6 years ago

From @tonycoz

On Sun\, 06 May 2018 21​:14​:24 -0700\, jim.avera@​gmail.com wrote​:

On 5/6/18 4​:35 PM\, Tony Cook via RT wrote​:

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them. ... This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

That seems to be some kind of secret or protected ticket!

  RT Error   No permission to display that ticket No details

I can see it as an anonymous guest (I opened a new browser).

Searching for the ticket number sent me to​:

https://rt.perl.org/Public/Bug/Display.html?id=125760

as did pasting the non-/Public/ address into the address bar.

If you still can't see it you might want to check with perlbug-admin (see the page footer) to see if something is messed up for your account.

Tony

p5pRT commented 5 years ago

From @jkeenan

On Sun\, 06 May 2018 23​:35​:33 GMT\, tonyc wrote​:

On Thu\, 03 May 2018 16​:41​:07 -0700\, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM\, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh->read() in those cases?

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

Thank you very much.

-- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 5 years ago

From @tonycoz

On Wed\, 17 Oct 2018 06​:13​:06 -0700\, jkeenan wrote​:

On Sun\, 06 May 2018 23​:35​:33 GMT\, tonyc wrote​:

On Thu\, 03 May 2018 16​:41​:07 -0700\, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM\, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh->read() in those cases?

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

No\, this ticket is obsoleted by those operators now being fatal on :utf8 handles and the documentation updates that included.

Tony

p5pRT commented 5 years ago

From @jkeenan

On Mon\, 22 Oct 2018 23​:53​:10 GMT\, tonyc wrote​:

On Wed\, 17 Oct 2018 06​:13​:06 -0700\, jkeenan wrote​:

On Sun\, 06 May 2018 23​:35​:33 GMT\, tonyc wrote​:

On Thu\, 03 May 2018 16​:41​:07 -0700\, jim.avera@​gmail.com wrote​:

On 5/3/18 2​:40 AM\, Tony Cook via RT wrote​:

The underlying problem is that sysread() etc pay attention to only one part of the layer stack - whether that PERLIO_K_UTF8 flag is set\, at which point it ignores the rest\, slurps in the bytes and marks them as SVf_UTF8.

With non-PERLIO_K_UTF8 layers sysread etc completely ignore the layers - reading (or writing) bytes from/to the underlying stream.

Hmm.  That's an unfortunate complexity involving perl's internal character representation which users really shouldn't need to be aware of.   I hope some solution can be found which doesn't _require_ documenting and user-understanding of this.

Is there any foreseeable path to making sysread() handle arbitrary layers correctly\, using buffering when data-transforming layers are present but not otherwise?  What if sysread just called fh-

read() in those cases?

Well\, that would completely change the behaviour of sysread() in the case of non-UTF-8 flagged file handles that have other layers on them.

One reason for making this a deprecation warning is so we're not silently changing this behaviour.

This deprecation was originally discussed in​:

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125760

Tony

Tony​: Should the patch you proposed in this RT be applied now?

No\, this ticket is obsoleted by those operators now being fatal on :utf8 handles and the documentation updates that included.

Ok\, closing.

-- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 5 years ago

@jkeenan - Status changed from 'open' to 'rejected'