Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 539 forks source link

perldoc mangles dashes and quotes #8723

Open p5pRT opened 17 years ago

p5pRT commented 17 years ago

Migrated from rt.perl.org#41170 (status was 'open')

Searchable as RT41170$

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

perldoc mangles some characters when displaying text on the console. For instances\, on my system dashes (-) and single quotes (') are replaced with "fancy" looking characters.

While this might look better\, it destroys the possibility to copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs\, which often contain code examples.

Attached is a very primitive sample package with POD\, and a text file that was created by doing​:

  perldoc A

on my console\, then marking the displayed text with the mouse\, and then pasting the text into a new file with vi.

Some notes​:

* My system is fully Unicode\, including the console. * I use Konsole\, the KDE console. * this problems exists since a few years\, it showed on SuSE 9.0\,   and it persists on SuSE 10.1 * it is not SuSE specific\, Paul Johnson said​:   "For me\, on a unicode enabled xterm under ubuntu\, the dashes come out   fine\, but the single quotes are dodgy\, as are yours here too."

Since this problem has lead to at least one embarassing bug report from me\, I want this now fixed.

Best wishes\,

Tels

Perl Info ``` Flags: category=core severity=medium This perlbug was built using Perl v5.8.8 - Sat Apr 22 23:31:53 UTC 2006 It is being executed now by Perl v5.8.8 - Sat Apr 22 23:26:49 UTC 2006. Site configuration information for perl v5.8.8: Configured by abuild at Sat Apr 22 23:26:49 UTC 2006. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.16, archname=x86_64-linux-thread-multi uname='linux dvorak 2.6.16 #1 smp mon apr 10 04:51:13 utc 2006 x86_64 x86_64 x86_64 gnulinux ' config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -Wall -pipe', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -pipe -Wdeclaration-after-statement' ccversion='', gccversion='4.1.0 (SUSE Linux)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib64' libpth=/lib64 /usr/lib64 /usr/local/lib64 libs=-lm -ldl -lcrypt -lpthread perllibs=-lm -ldl -lcrypt -lpthread libc=/lib64/libc-2.4.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64' Locally applied patches: @INC for perl v5.8.8: /usr/lib/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 /usr/lib/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl . Environment for perl v5.8.8: HOME=/home/te LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/jvm/jre/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/home/te/.local/bin PERL_BADLANG (unset) SHELL=/bin/bash -- Signed on Wed Jan 3 20:51:04 2007 with key 0x93B84C15. Get one of my photo posters: http://bloodgate.com/posters PGP key on http://bloodgate.com/tels.asc or per email. "Our second big loss has been the "IP" fudge, which is blurring the distinctions between patents, copyrights, trademarks, trade secrets, competative advantages, wishful thinking, bullshit, and marketing babble into one vague pile of lawyer poo." -- MarkusQ (450076), 2004-01-23 ```
p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

A.pm

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

  Normal paragraph​: Code​: "callāˆ’>me()". Inline​: callāˆ’>me(). ā€™singleā€™ "double".

  Verbatim​: Code​: C\<\< callā€>me() >>. Inline​: callā€>me(). ā€™singleā€™ "double".

  Normal text again.

p5pRT commented 17 years ago

From @JohnPeacock

Tels (via RT) wrote​:

* My system is fully Unicode\, including the console. * I use Konsole\, the KDE console.

Happens with Gnome Terminal\, as well.

* this problems exists since a few years\, it showed on SuSE 9.0\, and it persists on SuSE 10.1

and SuSE 10.2 (which is *really* nice\, Tels\, you should upgrade!)...

SuSE 9.0 was the first time that they enabled Unicode by default\, if I'm not mistaken. I noticed the same thing when I was running Mandrake/Mandriva. My solution to date has been to turn off UTF-8\, but then I'm just some stupid American... ;-)

John

-- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Blvd Suite H Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

p5pRT commented 17 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 17 years ago

From rick@bort.ca

On Jan 03 2007\, Tels wrote​:

perldoc mangles some characters when displaying text on the console. For instances\, on my system dashes (-) and single quotes (') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they were probably generated with options similar to what I have below.

While this might look better\, it destroys the possibility to copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs\, which often contain code examples.

Attached is a very primitive sample package with POD\, and a text file that was created by doing​:

perldoc A

This should help

  perldoc -n 'nroff -Tascii' A

Then you can just alias that. Or you could just alias John's way

  perldoc="LANG=en_US perldoc"

I think you could also add the switch to $ENV{PERLDOC}.

TMTOWTDI

-- Rick Delaney rick@​bort.ca

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

-----BEGIN PGP SIGNED MESSAGE----- Hash​: SHA1

Moin John et. al.\,

John Peacock \jpeacock@&#8203;rowman\.com wrote​:

Tels (via RT) wrote​:

* My system is fully Unicode\, including the console. * I use Konsole\, the KDE console.

Happens with Gnome Terminal\, as well.

Thanx for the confirmation. :D

* this problems exists since a few years\, it showed on SuSE 9.0\, and it persists on SuSE 10.1

and SuSE 10.2 (which is *really* nice\, Tels\, you should upgrade!)...

First​: Upgrades of SuSE are always risky\, especially with crypted file systems (and/or "exotic" hardware like RAID\, Sata\, SCSII etc). The are simple "Not done[tm]" around here\, I'd rather build a new machine and install from scratch. That was how I got from 9.0 to 10.1 :-) Second​: After the MS stunt Novell pulled I will have nothing more to do with SuSE/Novell at all. Using Linux was my way of escaping from stupid monopolies\, and I ain't not getting shafted through the backdoor by the morons at Novell headquarters.

SuSE 9.0 was the first time that they enabled Unicode by default\, if I'm not mistaken. I noticed the same thing when I was running Mandrake/Mandriva. My solution to date has been to turn off UTF-8\, but then I'm just some stupid American... ;-)

I just wish computers were invented in Japan or China\, and not in the "128 letters are enough for us" Europe or America. But then\, they would probably messed it up even more than ASCII did mess us up until Unicode came along.

best wishes\,

tels

- -- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Blvd Suite H Lanham\, MD 20706 301-459-3366 x.5010 fax 301-429-5747

- -- Signed on Thu Jan 4 18​:45​:26 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"Laugh and the world laughs with you\, snore and you sleep alone." -- Unknown

-----BEGIN PGP SIGNATURE----- Version​: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRZ0+NHcLPEOTuEwVAQJigQf+IVUUpxVw7XY787XWmSiW2ThAzrRX4/Tr Pmw2aJNIfm0DkrcZ3wNEnfdT7Tr3xZlIh4gz3zkwy/jYaDz+PyMUvnZ7d1e7f93u y/ueFTDu+9izHsv/LVmwOsfVWPGZKlLfcrnntiXoRpJsC3thvTRo7PVa/AXm+3ka lsCD5XA3sHwSKVxiX7XcTvKXCVGfDQ01ZxyNwnz61/eyQfevVeQFYgyT71hdo6iL rFkAEyA3Tz4a9C6AGLWkyjq5J1JViUJ45wqRAxvDkBQvjt19/e2+JoD1WLRJIhQx RAcIwVElK4jI+Q09ae98woA6GRQDGfX/8xR8G1esoFRJdmflZfZ98A== =Im4Q -----END PGP SIGNATURE-----

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

-----BEGIN PGP SIGNED MESSAGE----- Hash​: SHA1

Moin\,

Rick Delaney \rick@&#8203;bort\.ca wrote​:

On Jan 03 2007\, Tels wrote​:

perldoc mangles some characters when displaying text on the console. For instances\, on my system dashes (-) and single quotes (') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they were probably generated with options similar to what I have below.

What other manpages? I am talking about perldoc\, not about manpages.

While this might look better\, it destroys the possibility to copy & paste code out of perl documentation.

This is especially bad for verbatim paragraphs\, which often contain code examples.

Attached is a very primitive sample package with POD\, and a text file that was created by doing​:

perldoc A

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow * actually remember it * type it everytime you come to a new machine or ** set it up on every machine so you dont need to type it?

I expect perldoc to NOT modify (or prettify) the written documentation when outputting it. If\, as a POD author\, I f.i. write "$self->method()"\, then I expect the output to be exactly that\, and not some variant of it.

Then you can just alias that. Or you could just alias John's way

perldoc="LANG=en_US perldoc"

Ah\, and I guess that then still works with a chinese POD file?

I think you could also add the switch to $ENV{PERLDOC}.

I prefer not to mess with my environment to work round broken software.

TMTOWTDI

Like\, fix the bug?

Sorry if I sound bitter\, but I firmly the believe the computer should work for the human\, not the other way around (2052\, when the machines rise\, they will kill me for that remark\, if they can actually manage to work that long without fatally crashing...)

Best wishes\,

Tels

- -- Signed on Thu Jan 4 18​:50​:04 2007 with key 0x93B84C15. Get one of my photo posters​: http​://bloodgate.com/posters PGP key on http​://bloodgate.com/tels.asc or per email.

"If Duke Nukem Forever is not out in 2001\, something's very wrong." - George Broussard\, 2001 (http​://tinyurl.com/6m8nh) -----BEGIN PGP SIGNATURE----- Version​: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRZ1AEncLPEOTuEwVAQJ1Pgf+Iq4TlgQSAJB6rpdh+e4Gp+RsdGMIkGFJ gA7RjAqshqQllNDVb26MdN2bf++++3B5bdJjOOKIB/pkqXbK08efzlgRROLPpt/x eCY8BBDCVLxCDcrqjL177V2DgIAfn0qD/VM/RLBJjzl+45G+qmRsF2H7hX9ZNibF MRK4vka1HCfpUE8+Prr02xqWl4nLbquxJoEUeFBoTJspkn9gK5NDhUqmvZ5rMnVR EecHPOoS+g1jTq6CzzyrqPeu77WcVEX9AMtTTj3yThKFMD2qsXS9h0dFLhjLfQEJ Ri5WSrD9KISRTn02/z2YLChxboJjI51KwxsS14pAy4jqoZ7rQB4Ayw== =4iL6 -----END PGP SIGNATURE-----

p5pRT commented 17 years ago

From rick@bort.ca

On Jan 04 2007\, Tels wrote​:

Rick Delaney \rick@&#8203;bort\.ca wrote​:

On Jan 03 2007\, Tels wrote​:

perldoc mangles some characters when displaying text on the console. For instances\, on my system dashes (-) and single quotes (') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they were probably generated with options similar to what I have below.

What other manpages? I am talking about perldoc\, not about manpages.

According to the first paragraph of perldoc's DESCRIPTION\, perldoc is essentially

  pod2man | nroff -man | $PAGER

Which is to say that perldoc's default formatting is that of manpages. On my system (Ubuntu 6) I see just as many fancy quotes in other manpages as I do in perldoc. The culprit for this is nroff (or troff or whatever). That is what is fancying up the quotes.

While this might look better\, it destroys the possibility to copy & paste code out of perl documentation.

Or manpages.

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow * actually remember it * type it everytime you come to a new machine or ** set it up on every machine so you dont need to type it?

I expect nothing. I was simply suggesting a method you could use to get output you might like. A workaround. I made and make no judgement about the validity of this bug report. I'd be quite happy if the above were the default behaviour but I won't lose any sleep if it stays the same.

I expect perldoc to NOT modify (or prettify) the written documentation when outputting it. If\, as a POD author\, I f.i. write "$self->method()"\, then I expect the output to be exactly that\, and not some variant of it.

If you don't want the documentation prettified at all\, then you might be happy with `perldoc -t`. It should be easy enough to remember.

As for fixing this\, there may be something Pod​::Man could do to the input it gives nroff\, but I really don't know.

-- Rick Delaney rick@​bort.ca

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

Moin Rick\,

On Friday 05 January 2007 03​:52\, you wrote​:

On Jan 04 2007\, Tels wrote​:

Rick Delaney \rick@&#8203;bort\.ca wrote​:

On Jan 03 2007\, Tels wrote​:

perldoc mangles some characters when displaying text on the console. For instances\, on my system dashes (-) and single quotes (') are replaced with "fancy" looking characters.

Don't you see this with other manpages? I guess maybe not because they were probably generated with options similar to what I have below.

What other manpages? I am talking about perldoc\, not about manpages.

According to the first paragraph of perldoc's DESCRIPTION\, perldoc is essentially

pod2man | nroff \-man | $PAGER

Ah. Should have read the doc then. *goes hiding in a corner*

[snipabit]

This should help

perldoc -n 'nroff -Tascii' A

You honestly ecpect me and the average user to​:

* find that "magic" option combo out somehow * actually remember it * type it everytime you come to a new machine or ** set it up on every machine so you dont need to type it?

I expect nothing. I was simply suggesting a method you could use to get output you might like. A workaround. I made and make no judgement about the validity of this bug report. I'd be quite happy if the above were the default behaviour but I won't lose any sleep if it stays the same.

Ok\, I made a test C.pm with unicode chars in it. One more problem surfaces​:

  te@​linux​:\~/perl/perldoc> perldoc -t C   ./C.pm​:23​: Unknown command paragraph​: =encoding utf8

Ugh. According to perldoc perlpod\, "=encoding utf8" is the way to go.

Second problem​:

  perldoc C

doesn't even show the chinese characters\, but shows the Umlauts (and mangled dashes).

  perldoc -n 'nroff -Tascii' C

Doesn't work either\, and it warns a lot about chars it can't find.

  perldoc -n 'nroff -TUTF8' C

Mangles the dashes\, and looses the Chinese\, again.

Funnily enough\, a plain​:

  perldoc -t

shows all the characters properly\, including chinese\, dashes and umlauts. (it warns about the =encoding directive\, too\, tho)

"perldoc -t" doesn't look as "nice"\, e.g. it doesn't have bold headers etc\, but at least it works correct. I think that either​:

* perldoc -t should be the default\, since nroff seems to be broken * or​: a workaround with nroff could be found\, or nroff fixed\, and then made the default

Optionally\, if people really like the bold headers etc\, "-t" could be "spiced up" a bit.

The current situation\, where a plain "perldoc" is just wrong\, is\, well\, wrong :)

Sorry if I sounded to harsh in my first reply\, thanx a lot for your reply.

best wishes\,

tels

-- Signed on Fri Jan 5 12​:29​:02 2007 with key 0x93B84C15. View my photo gallery​: http​://bloodgate.com/photos PGP key on http​://bloodgate.com/tels.asc or per email.

"Die deutsche Zensoren - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dummkƶpfe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -." Heinrich Heine

p5pRT commented 17 years ago

From nospam-abuse@bloodgate.com

C.pm

p5pRT commented 17 years ago

From @demerphq

On 1/4/07\, Tels \nospam\-abuse@&#8203;bloodgate\.com wrote​:

I just wish computers were invented in Japan or China\, and not in the "128 letters are enough for us" Europe or America. But then\, they would probably messed it up even more than ASCII did mess us up until Unicode came along.

Just be glad it wasn't the Hawaiians\, they have only 12 letters. :-)

yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 17 years ago

From rick@bort.ca

On Jan 05 2007\, Tels wrote​:

* perldoc -t should be the default\, since nroff seems to be broken * or​: a workaround with nroff could be found\, or nroff fixed\, and then made the default

A bit of research leads me to believe that Pod​::Man should be doing some more escaping of its roff output. In particular\, I think anything between C\<> or in verbatim paragraphs should be escaped. It is already escaping '-' but it is doing nothing for quotes or other characters.

Perhaps we'll get a fresh shipment of round tuits now that it's a new year.

-- Rick Delaney rick@​bort.ca