Perl / perl5

🐪 The Perl programming language
1.91k stars 542 forks source link

Text::CSV::Encoded is incorrectly forced to parse widechar #15739

Closed p5pRT closed 7 years ago

p5pRT commented 7 years ago

Migrated from (status was 'rejected')

Searchable as RT130199$

p5pRT commented 7 years ago


Created by

After upgrading from debian-wheezy to debian-jessie HTML​::Mason started to behave strangely with respect to UTF8 encoding. Earlier both web-pages and forms were working correctly (in UTF8) without any special setup. As of jessie with Apache 2.4 UTF8 no longer works. 1. I had to add binmode(STDOUT\,'UTF8') to modules. 2. I had to decode_utf8($_) data from forms before passing them over to psql-db This report I file with example code of erratic behavior of Text​::CSV​::Encoded since I could narrow the problem to just a few lines of test-case.

======================== #!/usr/bin/perl use Text​::CSV​::Encoded; open(my $FH\, shift) or die "open"; binmode($FH\, "​:encoding(cp1250) :raw :bytes"); local $/ = "\r\n"; my $csv = Text​::CSV​::Encoded->new ( { encoding_in => "cp1250"\,   binary => 1\, eol => $/\, sep_char => ';'\,   } ) or die "Cannot use CSV​: ".Text​::CSV->error_diag (); $\ = "\n"; while ( \<$FH> ) {   s/\s+$//;   print;   if ($csv->parse( $_ )) {   print $csv->fields();   } } __END__ 10;"SPӣDZIELNIA WARSZAWA";62;"TEST"

In this example​: 1. the test file (provided "inline") as \ contains two speciffic characters from CODE-PAGE-1250\, one such char just after another. 1a. this test file IS-NOT UTF8 encoded. 2. the input stream is correctly marked as CP1250 3. the module gets correct information as to that file encoding ... and yet\, the module complains about encoutering a "wide-char"\, which in the above setup should not ever be possible (I think).

The result of the above program is​:

$ ./wide-char test-input 10;"SPӣDZIELNIA WARSZAWA";62;"TEST" Wide character in subroutine entry at /usr/share/perl5/Text/CSV/Encoded/Coder/ line 37\, \<$FH> chunk 1. $

This result is incorrect\, since the file does not contain any "wide chars".

Perl Info ``` Flags: category=core severity=high Site configuration information for perl 5.20.2: Configured by Debian Project at Fri Jul 22 15:47:27 UTC 2016. Summary of my perl5 (revision 5 version 20 subversion 2) configuration: Platform: osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi uname='linux himalia 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.20 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.20 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.20 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.20.2 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.20.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dusesitecustomize -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='4.9.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt, so=so, useshrplib=true, gnulibc_version='2.19' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/libperl_embed_doc - Note that libperl-dev package is required for embedded linking DEBPKG:fixes/respect_umask - Respect umask during installation DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/prune_libs - Prune the list of libraries wanted to what we actually need. DEBPKG:fixes/net_smtp_docs - [ #36038] Document the Net::SMTP 'Port' option DEBPKG:debian/perlivp - Make perlivp skip include directories in /usr/local DEBPKG:debian/deprecate-with-apt - Point users to Debian packages of deprecated core modules DEBPKG:debian/squelch-locale-warnings - Squelch locale warnings in Debian package maintainer scripts DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository DEBPKG:debian/patchlevel - List packaged patches for 5.20.2-3+deb8u6 in patchlevel.h DEBPKG:debian/skip-kfreebsd-crash - [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD DEBPKG:fixes/document_makemaker_ccflags - [ #68613] Document that CCFLAGS should include $Config{ccflags} DEBPKG:debian/find_html2text - Configure CPAN::Distribution with correct name of html2text DEBPKG:debian/perl5db-x-terminal-emulator.patch - Invoke x-terminal-emulator rather than xterm in DEBPKG:debian/cpan-missing-site-dirs - Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable DEBPKG:fixes/memoize_storable_nstore - [ #77790] Memoize::Storable: respect 'nstore' option not respected DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories DEBPKG:fixes/regcomp-mips-optim - [perl #122817] Downgrade the optimization of regcomp.c on mips and mipsel due to a gcc-4.9 bug DEBPKG:debian/makemaker-pasthru - Pass LD settings through to subdirectories DEBPKG:fixes/perldoc-less-R - [ #98636] Tell the 'less' pager to allow terminal escape sequences DEBPKG:fixes/pod_man_reproducible_date - Support POD_MAN_DATE in Pod::Man for the left-hand footer DEBPKG:fixes/io_uncompress_gunzip_inmemory - [ #95494] Fix gunzip to in-memory file handle DEBPKG:fixes/socket_test_recv_fix - [perl #122657] Compare recv return value to peername in socket test DEBPKG:fixes/hurd_socket_recv_todo - [perl #122657] TODO checking the result of recv() on hurd DEBPKG:fixes/regexp-performance - [0fa70a0] [perl #123743] simpify and speed up /.*.../ handling DEBPKG:fixes/failed_require_diagnostics - [perl #123270] Report inaccesible file on failed require DEBPKG:fixes/array-cloning - [perl #124127] [902d169] fix cloning arrays with unused elements DEBPKG:fixes/perldb-threads - [perl #124127] [41ef2c6] lib/ Restore noop lock prototype DEBPKG:fixes/CVE-2015-8607_file_spec_taint_fix - ensure File::Spec::canonpath() preserves taint DEBPKG:fixes/encode-unicode-bom - [ #107043] Address DEBPKG:debian/encode-unicode-bom-doc - Document Debian backport of Encode::Unicode fix DEBPKG:debian/kfreebsd-softupdates - Work around Debian Bug#796798 DEBPKG:fixes/CVE-2016-2381_duplicate_env - remove duplicate environment variables from environ DEBPKG:debian/debugperl-compat-fix - [perl #127212] Disable PERL_TRACK_MEMPOOL for debugging builds DEBPKG:fixes/CVE-2015-8853_regexp_hang - [perl #123562] PATCH [perl #123562] Regexp-matching "hangs" DEBPKG:fixes/utf8_regexp_crash - [perl #124109] save_re_context(): do "local $n" with no PL_curpm DEBPKG:fixes/regcomp_whitespace_fix - [perl #124109] Perl_save_re_context(): re-indent after last commit DEBPKG:fixes/5.20.3/eval_label_crash - [perl #123652] eval {label:} crash DEBPKG:fixes/5.20.3/preserve_record_separator - [perl #123218] "preserve" $/ if set to a bad value DEBPKG:fixes/5.20.3/test_count_base_rs - Fix test count in t/base/rs.t DEBPKG:fixes/5.20.3/remove_get_magic - [perl #123739] Remove get-magic from $/ DEBPKG:fixes/5.20.3/speed_up_scalar_g - [perl #123202] speed up scalar //g against tainted strings DEBPKG:fixes/5.20.3/accidental_all_features - Stop $^H |= 0x1c020000 from enabling all features DEBPKG:fixes/5.20.3/multidimensional_arrays_utf8 - [perl #124113] Make check for multi-dimensional arrays be UTF8-aware DEBPKG:fixes/5.20.3/unquoted_utf8_heredoc_terminators - Allow unquoted UTF-8 HERE-document terminators DEBPKG:fixes/5.20.3/parentheses_ambiguous_warning_utf8_functions - Fix "...without parentheses is ambuguous" warning for UTF-8 function names DEBPKG:fixes/5.20.3/leak_namepv_copy - [perl #123786] don't leak the temp utf8 copy of namepv DEBPKG:fixes/5.20.3/h2ph_hex_constants - h2ph: correct handling of hex constants for the preamble DEBPKG:fixes/5.20.3/leftbracket_XTERMORDORDOR - [perl #123711] Fix crash with 0-5x-l{0} DEBPKG:fixes/5.20.3/fatalize_warnings_unwinding - [perl #123398] don't fatalize warnings during unwinding (#123398) DEBPKG:fixes/5.20.3/setpgrp - =?UTF-8?q?Don=E2=80=99t=20treat=20setpgrp($nonzero)=20as=20setpgr?= =?UTF-8?q?p(1)?= DEBPKG:fixes/5.20.3/death_unwinding_crash - [perl #124156] RT #124156: death during unwinding causes crash DEBPKG:fixes/5.20.3/stashpvn_crash - [perl #125541] Fix crash with %::=(); J->${\"::"} DEBPKG:fixes/5.20.3/possessive_quantifier - [perl #125825] PATCH: [perl 125825] {n}+ possessive quantifier broken DEBPKG:fixes/5.20.3/quoted_code_crash - [perl #123712] Fix /$a[/ parsing DEBPKG:fixes/5.20.3/checking_sub_inwhat - [perl #123712] Don't check sub_inwhat DEBPKG:fixes/5.20.3/yylex_loop - Fix hang with "@{" DEBPKG:fixes/5.20.3/docs/op - Fix apidocs for OP_TYPE_IS(_OR_WAS) - arguments separated by |, not ,. DEBPKG:fixes/5.20.3/docs/encoding - perlpodspec: Corrections/adds to detecting =encoding DEBPKG:fixes/5.20.3/docs/SvPV_set - improve SvPV_set's docs, it really shouldn't be public API DEBPKG:fixes/5.20.3/docs/autodie - Fix warning message regarding "use autodie" and "use open". DEBPKG:fixes/5.20.3/docs/autodie_2_26 - perlunicook: Note that autodie >= 2.26 should be okay with "use open". DEBPKG:fixes/5.20.3/docs/setenv - Fix setenv() replacement documentation in perlclib DEBPKG:fixes/5.20.3/docs/clib_caution - perlhacktips: Add caution about clib ptr returns to static memory DEBPKG:fixes/5.20.3/docs/perlunicook_typos - Fix minor code typos in perlunicook DEBPKG:fixes/5.20.3/docs/ook_example - [perl #122322] Update OOK example in perlguts DEBPKG:fixes/5.20.3/docs/study_noop - perlfunc: mention that study() is currently a noop DEBPKG:fixes/CVE-2016-1238/remove-dot-when-loading - [perl #127834] (perl #127834) remove . from the end of @INC if complex modules are loaded DEBPKG:fixes/CVE-2016-1238/remove-dot-in-padwalker - [perl #127834] ensure PadWalker is loaded from standard paths DEBPKG:fixes/CVE-2016-1238/remove-dot-in-dist - [perl #127834] dist/: remove . from @INC when loading optional modules DEBPKG:fixes/CVE-2016-1238/remove-dot-in-cpan - [perl #127834] cpan/: remove . from @INC when loading optional modules DEBPKG:fixes/CVE-2016-1238/customized-encode - Update customized.dat for cpan/Encode/ DEBPKG:debian/CVE-2016-1238/test-suite-without-dot - [perl #127810] Patch unit tests to explicitly insert "." into @INC when needed. DEBPKG:debian/CVE-2016-1238/eumm-without-dot - [perl #127810] Add PERL_USE_UNSAFE_INC support to EU::MM for fortify_inc support. DEBPKG:debian/CVE-2016-1238/cpan-without-dot - [perl #127810] Set PERL_USE_UNSAFE_INC for cpan usage DEBPKG:debian/CVE-2016-1238/mb-without-dot - Make Module::Build set PERL_USE_UNSAFE_INC DEBPKG:debian/CVE-2016-1238/sitecustomize-in-etc - Look for in /etc/perl rather than sitelib on Debian systems DEBPKG:fixes/xsloader-eval - [ #115808] =?UTF-8?q?Don=E2=80=99t=20let=20XSLoader=20load=20relative=20path?= =?UTF-8?q?s?= @INC for perl 5.20.2: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.2 /usr/local/share/perl/5.20.2 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl Environment for perl 5.20.2: HOME=/home/rafal LANG=pl_PL.utf8 LANGUAGE=en_US:en LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/rafal/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 7 years ago

From @jkeenan

On Mon\, 28 Nov 2016 12​:34​:02 GMT\, rafal@​ wrote​:

This is a bug report for perl from rafal@​\, generated with the help of perlbug 1.40 running under perl 5.20.2.

----------------------------------------------------------------- [Please describe your issue here] After upgrading from debian-wheezy to debian-jessie HTML​::Mason started to behave strangely with respect to UTF8 encoding. Earlier both web- pages and forms were working correctly (in UTF8) without any special setup. As of jessie with Apache 2.4 UTF8 no longer works. 1. I had to add binmode(STDOUT\,'UTF8') to modules. 2. I had to decode_utf8($_) data from forms before passing them over to psql-db This report I file with example code of erratic behavior of Text​::CSV​::Encoded since I could narrow the problem to just a few lines of test-case.

======================== #!/usr/bin/perl use Text​::CSV​::Encoded; open(my $FH\, shift) or die "open"; binmode($FH\, "​:encoding(cp1250) :raw :bytes"); local $/ = "\r\n"; my $csv = Text​::CSV​::Encoded->new ( { encoding_in => "cp1250"\, binary => 1\, eol => $/\, sep_char => ';'\, } ) or die "Cannot use CSV​: ".Text​::CSV->error_diag (); $\ = "\n"; while ( \<$FH> ) { s/\s+$//; print; if ($csv->parse( $_ )) { print $csv->fields(); } } __END__ 10;"SPӣDZIELNIA WARSZAWA";62;"TEST"

In this example​: 1. the test file (provided "inline") as \ contains two speciffic characters from CODE-PAGE-1250\, one such char just after another. 1a. this test file IS-NOT UTF8 encoded. 2. the input stream is correctly marked as CP1250 3. the module gets correct information as to that file encoding ... and yet\, the module complains about encoutering a "wide-char"\, which in the above setup should not ever be possible (I think).

The result of the above program is​:

$ ./wide-char test-input 10;"SPӣDZIELNIA WARSZAWA";62;"TEST" Wide character in subroutine entry at /usr/share/perl5/Text/CSV/Encoded/Coder/ line 37\, \<$FH> chunk 1. $

This result is incorrect\, since the file does not contain any "wide chars".

It appears that the file does indeed contain characters which satisfy the condition required for the "Wide characters" warning. Here's what pod/perldiag.pod in perl-5.24.0 says​:

##### =item Wide character in %s

(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the C\<​:utf8> layer to the output\, e.g. C\<binmode STDOUT\, '​:utf8'>. Another way to turn off the warning is to add C\<no warnings 'utf8';> but that is often closer to cheating. In general\, you are supposed to explicitly mark the filehandle with an encoding\, see L\ and L\<perlfunc/binmode>. #####

If I put your test data into a file and run it through 'od -c'\, I observe two characters in the >255 range.

##### $ od -c warsaw.txt 0000000 1 0 ; " S P 323 243 D Z I E L N I A 0000020 \n W A R S Z A W A " ; 6 2 ; " T 0000040 E S T " \n 0000045 #####

Text​::CSV​::Encoded is not part of the Perl 5 core distribution\, so I think including it in the test script muddies the waters. Here's a pure Perl reduction​:

##### $ cat # perl use strict; use warnings;

open(my $FH\, '\<'\, 'warsaw.txt') or die "open"; binmode($FH\, "​:encoding(cp1250)"); while ( \<$FH> ) {   s/\s+$//;   print "$_\n"; } close $FH or die "close"; ##### $ perl Wide character in print at line 9\, \<$FH> line 1. 10;"SPÓŁDZIELNIA WARSZAWA";62;"TEST" #####

I think that warning is appropriate. However\, I concede that I don't have much experience with 'cp1250' so I'm unclear what the expected behavior is. Other people on list should comment.

Thank you very much.

p5pRT commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 7 years ago

From @jkeenan

On Mon\, 28 Nov 2016 23​:03​:51 GMT\, jkeenan wrote​:

On Mon\, 28 Nov 2016 12​:34​:02 GMT\, rafal@​ wrote​:

This is a bug report for perl from rafal@​\, generated with the help of perlbug 1.40 running under perl 5.20.2.

----------------------------------------------------------------- [Please describe your issue here] After upgrading from debian-wheezy to debian-jessie HTML​::Mason started to behave strangely with respect to UTF8 encoding. Earlier both web- pages and forms were working correctly (in UTF8) without any special setup. As of jessie with Apache 2.4 UTF8 no longer works. 1. I had to add binmode(STDOUT\,'UTF8') to modules. 2. I had to decode_utf8($_) data from forms before passing them over to psql-db This report I file with example code of erratic behavior of Text​::CSV​::Encoded since I could narrow the problem to just a few lines of test-case.

======================== #!/usr/bin/perl use Text​::CSV​::Encoded; open(my $FH\, shift) or die "open"; binmode($FH\, "​:encoding(cp1250) :raw :bytes"); local $/ = "\r\n"; my $csv = Text​::CSV​::Encoded->new ( { encoding_in => "cp1250"\, binary => 1\, eol => $/\, sep_char => ';'\, } ) or die "Cannot use CSV​: ".Text​::CSV->error_diag (); $\ = "\n"; while ( \<$FH> ) { s/\s+$//; print; if ($csv->parse( $_ )) { print $csv->fields(); } } __END__ 10;"SPӣDZIELNIA WARSZAWA";62;"TEST"

In this example​: 1. the test file (provided "inline") as \ contains two speciffic characters from CODE-PAGE-1250\, one such char just after another. 1a. this test file IS-NOT UTF8 encoded. 2. the input stream is correctly marked as CP1250 3. the module gets correct information as to that file encoding ... and yet\, the module complains about encoutering a "wide-char"\, which in the above setup should not ever be possible (I think).

The result of the above program is​:

$ ./wide-char test-input 10;"SPӣDZIELNIA WARSZAWA";62;"TEST" Wide character in subroutine entry at /usr/share/perl5/Text/CSV/Encoded/Coder/ line 37\, \<$FH> chunk 1. $

This result is incorrect\, since the file does not contain any "wide chars".

It appears that the file does indeed contain characters which satisfy the condition required for the "Wide characters" warning. Here's what pod/perldiag.pod in perl-5.24.0 says​:

##### =item Wide character in %s

(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the C\<​:utf8> layer to the output\, e.g. C\<binmode STDOUT\, '​:utf8'>. Another way to turn off the warning is to add C\<no warnings 'utf8';> but that is often closer to cheating. In general\, you are supposed to explicitly mark the filehandle with an encoding\, see L\ and L\<perlfunc/binmode>. #####

If I put your test data into a file and run it through 'od -c'\, I observe two characters in the >255 range.

##### $ od -c warsaw.txt 0000000 1 0 ; " S P 323 243 D Z I E L N I A 0000020 \n W A R S Z A W A " ; 6 2 ; " T 0000040 E S T " \n 0000045 #####

Text​::CSV​::Encoded is not part of the Perl 5 core distribution\, so I think including it in the test script muddies the waters. Here's a pure Perl reduction​:

##### $ cat # perl use strict; use warnings;

open(my $FH\, '\<'\, 'warsaw.txt') or die "open"; binmode($FH\, "​:encoding(cp1250)"); while ( \<$FH> ) { s/\s+$//; print "$_\n"; } close $FH or die "close"; ##### $ perl Wide character in print at line 9\, \<$FH> line 1. 10;"SPÓŁDZIELNIA WARSZAWA";62;"TEST" #####

I think that warning is appropriate. However\, I concede that I don't have much experience with 'cp1250' so I'm unclear what the expected behavior is. Other people on list should comment.

Thank you very much.

On #p5p khw has pointed out an error in my analysis. 'od -c' prints octal. So these characters are below \0377 equivalent to 255.

Also\, in my test program I should have applied binmode to STDOUT as well.

##### # perl use strict; use warnings;

open(my $FH\, '\<'\, 'warsaw.txt') or die "open"; binmode($FH\, "​:encoding(cp1250)"); binmode(STDOUT\, "​:encoding(cp1250)"); while ( \<$FH> ) {   s/\s+$//;   print "$_\n"; } close $FH or die "close"; ##### $ perl 10;"SPӣDZIELNIA WARSZAWA";62;"TEST" #####

And once I 'binmode' STDOUT\, the "Wide character" warning goes away. So\, notwithstanding my errors\, I still think this is not a bug -- at least not in perl-5.24.0.

Thank you very much.

-- James E Keenan (jkeenan@​

p5pRT commented 7 years ago

From @eserte

Dana Mon\, 28 Nov 2016 04​:34​:02 -0800\, rafal@​ reče​:

This is a bug report for perl from rafal@​\, generated with the help of perlbug 1.40 running under perl 5.20.2.

----------------------------------------------------------------- [Please describe your issue here] After upgrading from debian-wheezy to debian-jessie HTML​::Mason started to behave strangely with respect to UTF8 encoding. Earlier both web- pages and forms were working correctly (in UTF8) without any special setup. As of jessie with Apache 2.4 UTF8 no longer works. 1. I had to add binmode(STDOUT\,'UTF8') to modules. 2. I had to decode_utf8($_) data from forms before passing them over to psql-db This report I file with example code of erratic behavior of Text​::CSV​::Encoded since I could narrow the problem to just a few lines of test-case.

======================== #!/usr/bin/perl use Text​::CSV​::Encoded; open(my $FH\, shift) or die "open"; binmode($FH\, "​:encoding(cp1250) :raw :bytes"); local $/ = "\r\n"; my $csv = Text​::CSV​::Encoded->new ( { encoding_in => "cp1250"\, binary => 1\, eol => $/\, sep_char => ';'\, } ) or die "Cannot use CSV​: ".Text​::CSV->error_diag (); $\ = "\n"; while ( \<$FH> ) { s/\s+$//; print; if ($csv->parse( $_ )) { print $csv->fields(); } } __END__ 10;"SPӣDZIELNIA WARSZAWA";62;"TEST"

In this example​: 1. the test file (provided "inline") as \ contains two speciffic characters from CODE-PAGE-1250\, one such char just after another. 1a. this test file IS-NOT UTF8 encoded. 2. the input stream is correctly marked as CP1250 3. the module gets correct information as to that file encoding ... and yet\, the module complains about encoutering a "wide-char"\, which in the above setup should not ever be possible (I think).

The result of the above program is​:

$ ./wide-char test-input 10;"SPӣDZIELNIA WARSZAWA";62;"TEST" Wide character in subroutine entry at /usr/share/perl5/Text/CSV/Encoded/Coder/ line 37\, \<$FH> chunk 1. $

This result is incorrect\, since the file does not contain any "wide chars".

[Please do not change anything below this line] -----------------------------------------------------------------

As it seems to make a difference if the CSV file has DOS or UNIX newlines --- can you attach the sample file? (In any case\, either with DOS or UNIX newlines I don't see different behavior between Debian's perl in wheezy and jessie)

p5pRT commented 7 years ago


Le 28/11/2016 à 13​:34\, (via RT) a écrit :


Maybe a wild shot but isn't that combination asking for trouble ? FWIW\, see http​://

p5pRT commented 7 years ago

From @jkeenan

On Tue\, 29 Nov 2016 08​:24​:13 GMT\, slaven@​ wrote​:

Dana Mon\, 28 Nov 2016 04​:34​:02 -0800\, rafal@​ reče​:

[snip] As it seems to make a difference if the CSV file has DOS or UNIX newlines --- can you attach the sample file? (In any case\, either with DOS or UNIX newlines I don't see different behavior between Debian's perl in wheezy and jessie)

Rafal\, can you please provide the sample file as an email attachment? We will need this for further diagnosis.

Thank you very much.

-- James E Keenan (jkeenan@​

p5pRT commented 7 years ago

From @jkeenan

On Fri\, 02 Dec 2016 21​:55​:42 GMT\, jkeenan wrote​:

On Tue\, 29 Nov 2016 08​:24​:13 GMT\, slaven@​ wrote​:

Dana Mon\, 28 Nov 2016 04​:34​:02 -0800\, rafal@​ reče​:

[snip] As it seems to make a difference if the CSV file has DOS or UNIX newlines --- can you attach the sample file? (In any case\, either with DOS or UNIX newlines I don't see different behavior between Debian's perl in wheezy and jessie)

Rafal\, can you please provide the sample file as an email attachment? We will need this for further diagnosis.

If there's no response from the original poster within a week\, I will close this ticket.

Thank you very much.

-- James E Keenan (jkeenan@​

p5pRT commented 7 years ago

From @jkeenan

On Sun\, 25 Dec 2016 02​:12​:24 GMT\, jkeenan wrote​:

On Fri\, 02 Dec 2016 21​:55​:42 GMT\, jkeenan wrote​:

On Tue\, 29 Nov 2016 08​:24​:13 GMT\, slaven@​ wrote​:

Dana Mon\, 28 Nov 2016 04​:34​:02 -0800\, rafal@​ reče​:

[snip] As it seems to make a difference if the CSV file has DOS or UNIX newlines --- can you attach the sample file? (In any case\, either with DOS or UNIX newlines I don't see different behavior between Debian's perl in wheezy and jessie)

Rafal\, can you please provide the sample file as an email attachment? We will need this for further diagnosis.

If there's no response from the original poster within a week\, I will close this ticket.

Thank you very much.

Closing as per schedule. Thank you very much.

-- James E Keenan (jkeenan@​

p5pRT commented 7 years ago

@jkeenan - Status changed from 'open' to 'rejected'