Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

unicore/TestProp.pl - big and unused? #10886

Closed p5pRT closed 13 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#80480 (status was 'resolved')

Searchable as RT80480$

p5pRT commented 13 years ago

From perl@plan9.de

Created by perl@plan9.de

unicore/ contains a lot of apparently unused files​:

  mktables   mktables.lst   *.txt   auxiliary/*   extracted/*

These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.

One of the biggest files\, however​:

  TestProp.pl

Seem to be part of the testuite only (apparently it only conatins autogenerated tests)\, and surely could be removed?

Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size.

This is considerable\, especially for embedded systems. Please consider not installing these files if possible.

Perl Info ``` Flags: category=core severity=wishlist Site configuration information for perl 5.12.2: Configured by Marc Lehmann at Mon Nov 22 07:24:35 CET 2010. Summary of my perl5 (revision 5 version 12 subversion 2) configuration: Platform: osname=linux, osvers=2.6.32-5-amd64, archname=x86_64-linux uname='linux cerebro 2.6.32-5-amd64 #1 smp fri sep 17 21:50:19 utc 2010 x86_64 gnulinux ' config_args='-Duselargefiles -Duse64bitint -Dusemymalloc=n -Dstatic_ext=Fcntl -Dcc=gcc -Dccflags=-ggdb -gdwarf-2 -g3 -Dcppflags=-DPERL_DISABLE_PMC -DPERL_ARENA_SIZE=1048576 -D_GNU_SOURCE -I/opt/include -Doptimize=-DPERL_DISABLE_PMC -DPERL_ARENA_SIZE=1048576 -D_GNU_SOURCE -I/opt/include -O6 -fno-strict-aliasing -Dcccdlflags=-fPIC -Dldflags=-L/opt/perl/lib -L/opt/lib -Dlibs=-ldl -lm -lcrypt -Dprefix=/opt/perl -Dprivlib=/opt/perl/lib/perl5 -Darchlib=/opt/perl/lib/perl5 -Dvendorprefix=/opt/perl -Dvendorlib=/opt/perl/lib/perl5 -Dvendorarch=/opt/perl/lib/perl5 -Dsiteprefix=/opt/perl -Dsitelib=/opt/perl/lib/perl5 -Dsitearch=/opt/perl/lib/perl5 -Dsitebin=/opt/perl/bin -Dman1dir=/opt/perl/man/man1 -Dman3dir=/opt/perl/man/man3 -Dsiteman1dir=/opt/perl/man/man1 -Dsiteman3dir=/opt/perl/man/man3 -Dman1ext=1 -Dman3ext=3 -Dpager=/usr/bin/less -Uafs -Uusesfio -Uusenm -Uuseshrplib -Ud_dosuid -Dusethreads=undef -Duse5005threads=undef -Duseithreads=undef -Dusemultiplicity=undef -Demail=perl-binary@plan9.de -Dcf_email=perl-binary@plan9.de -Dcf_by=Marc Lehmann -Dlocincpth=/opt/perl/include /opt/include -Dmyhostname=localhost -Dmultiarch=undef -Dbin=/opt/perl/bin -Dxxxusedevel -DxxxDEBUGGING -Dxxxuse_debugging_perl -Dxxxuse_debugmalloc -dEs' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-ggdb -gdwarf-2 -g3 -fno-strict-aliasing -pipe -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-DPERL_DISABLE_PMC -DPERL_ARENA_SIZE=1048576 -D_GNU_SOURCE -I/opt/include -O6 -fno-strict-aliasing', cppflags='-DPERL_DISABLE_PMC -DPERL_ARENA_SIZE=1048576 -D_GNU_SOURCE -I/opt/include -ggdb -gdwarf-2 -g3 -fno-strict-aliasing -pipe -I/opt/include' ccversion='', gccversion='4.4.5 20100728 (prerelease)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags ='-L/opt/perl/lib -L/opt/lib -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 libs=-ldl -lm -lcrypt perllibs=-ldl -lm -lcrypt libc=/lib/libc-2.11.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.11.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -DPERL_DISABLE_PMC -DPERL_ARENA_SIZE=1048576 -D_GNU_SOURCE -I/opt/include -O6 -fno-strict-aliasing -L/opt/perl/lib -L/opt/lib -L/usr/local/lib' Locally applied patches: @INC for perl 5.12.2: /root/src/sex /opt/perl/lib/perl5 /opt/perl/lib/perl5 /opt/perl/lib/perl5 . Environment for perl 5.12.2: HOME=/root LANG (unset) LANGUAGE (unset) LC_CTYPE=en_US.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/root/s2:/root/s:/opt/bin:/opt/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11/bin:/usr/games:/usr/local/bin:/usr/local/sbin:/root/pserv:. PERL5LIB=/root/src/sex PERL5_CPANPLUS_CONFIG=/root/.cpanplus/config PERLDB_OPTS=ornaments=0 PERL_ANYEVENT_DBI_TESTS=1 PERL_ANYEVENT_EDNS0=1 PERL_ANYEVENT_NET_TESTS=1 PERL_ANYEVENT_PROTOCOLS=ipv4,ipv6 PERL_ANYEVENT_STRICT=1 PERL_BADLANG (unset) PERL_UNICODE=E SHELL=/bin/bash ```
p5pRT commented 13 years ago

From @khwilliamson

On Thu Dec 09 05​:34​:30 2010\, perl@​plan9.de wrote​:

This is a bug report for perl from perl@​plan9.de\, generated with the help of perlbug 1.39 running under perl 5.12.2.

----------------------------------------------------------------- [Please describe your issue here]

unicore/ contains a lot of apparently unused files​:

mktables mktables.lst *.txt auxiliary/* extracted/*

These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.

One of the biggest files\, however​:

TestProp.pl

Seem to be part of the testuite only (apparently it only conatins autogenerated tests)\, and surely could be removed?

Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size.

This is considerable\, especially for embedded systems. Please consider not installing these files if possible.

Removing most of these is planned for 5.14. TestProp.pl is only used for the test suite and can be removed after that is run. mktables and the .txt files are used to generate the tables that the Perl core does use. These are not shipped with distribution; ironically apparently to save space. However\, the tables do take up less space than the source .txt files. However\, some of these .txt files are used by Unicode​::UCD. It is planned to change this for 5.14 as well. A few CPAN modules expect those .txt files to be in place\, however.

Building the tables ahead of time would require some changes due to machine word length issues involving numeric precision with the values of the Unicode characters that denote infinitely repeating fractions.

In the meantime\, you can remove TestProp.pl by hand from your machine; and if you don't intend to recompile\, mktables and .lst as well\, or you can remove the all-comment lines from mktables for some space savings without loss of functionality.

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From tchrist@perl.com

In-Reply-To​: Message from karl williamson \public@​khwilliamson\.com   of "Fri\, 31 Dec 2010 14​:22​:01 MST."   \4D1E4979\.1080401@​khwilliamson\.com

Tom Christiansen wrote​:

Looking at blead\, it appears that seriously fewer files will be installed under unicore/ than currently are.

Am I misreading this?

I ask because some I rely on seem not to migrate at install time.

I don't know what you're referring to; there have been no very recent changes to this that I'm aware of\, but it's planned to remove most of the .txt files for 5.14 because of requests about disk usage.

I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.

I believe it is intended to go away; it's not used by the Perl core.

You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.

First\, here I believe is the relevant thread​:

In-Reply-To​: Message from "Karl Williamson via RT" \perlbug\-followup@​perl\.org   of "Thu\, 30 Dec 2010 21​:40​:37 PST."   \rt\-3\.6\.HEAD\-5425\-1293774036\-542\.80480\-15\-0@​perl\.org

On Thu Dec 09 05​:34​:30 2010\, perl@​plan9.de wrote​:

unicore/ contains a lot of apparently unused files​:

mktables mktables.lst *.txt auxiliary/* extracted/*

These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.

One of the biggest files\, however​:

TestProp.pl

Seem to be part of the testuite only (apparently it only contains autogenerated tests)\, and surely could be removed?

Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size. This is considerable\, especially for embedded systems. Please consider not installing these files if possible.

Removing most of these is planned for 5.14. TestProp.pl is only used for the test suite and can be removed after that is run. mktables and the .txt files are used to generate the tables that the Perl core does use. These are not shipped with distribution; ironically apparently to save space. However\, the tables do take up less space than the source .txt files. However\, some of these .txt files are used by Unicode​::UCD. It is planned to change this for 5.14 as well. A few CPAN modules expect those .txt files to be in place\, however.

Building the tables ahead of time would require some changes due to machine word length issues involving numeric precision with the values of the Unicode characters that denote infinitely repeating fractions.

In the meantime\, you can remove TestProp.pl by hand from your machine; and if you don't intend to recompile\, mktables and .lst as well\, or you can remove the all-comment lines from mktables for some space savings without loss of functionality.

Removing the all-comment lines from mktables seems to save ~256k​:

  % perl -nle 'print unless /^\s*#/' mktables | wc mktables /dev/stdin   14811 76214 613719 mktables   10583 34637 351108 /dev/stdin   25394 110851 964827 total

If we count only the *.txt files in unicore/\, then one *could* always compress them​:

  6496 utext-orig   1236 utext-gzip   1020 utext-bzip2   660 utext-7za

But to return to the original message​:

I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.

I believe it is intended to go away; it's not used by the Perl core. You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.

Just because it isn't used by the Perl core\, doesn't mean it isn't used by things relying on its presence in the Perl core. Put another way\, because it has been *shipped* with the Perl core\, it may have *been being used* by code that relies on the Perl core. One cannot know what code of other people's one will consequently break by stripping files from the core. This is a general concern\, not one restricted to this particular issue.

The program I have that makes use of unicore/NamesList.txt is uninames. I *can* always include that entire files in the \ block\, just as I must include in my uniprops program the whole perluniprops.pod for backwards compat before that file's appearance.

It's preferable to *not* do that\, because there's some usefulness in binding the unistuff you're looking through to the versions of those things that were/are actually used by a particular release of Perl. I won't be able to do that any longer. Even worse\, I no longer automatically upgrade the NamesList when Perl uses an updated one! This creates a maintenance and synchronization problem where one did not previously exist. Previously\, it just updated with Perl\, so existing code automatically got the new version without my having to do anything. Now I do.

Plus\, the \ trick is only reasonable if there is but a single solitary program that alone uses that particular file.
It becomes awkward if you have more than one program using that file\, because now you have it in more than one place on your system\, thereby creating wasted space when one was hoping to preserve it. This works against the very thing one was ostensibly trying to address.

I realize that nothing delights downstream rebundlers more than pruning off parts of the Perl distribution. I'd like not to rehash my grievances with what some of the Linux people do with their nomenclature of what's core and what isn't core.

Previously when things were stripped out of the core\, some effort was made to encapsulate these into some CPAN module so that equivalent functionality was still achievable. I don't believe that such a package has been considered for these.

Apart from you\, Karl\, I may be one of the few among us to routinely cd unicore and browse around for interesting bits of knowledge. I am perfectly content to continue doing that with the unicore directory in the Perl src distribution rather than in the installed perllib.

But that doesn't address the other concerns I just mentioned. That's a personal matter\, and probably shouldn't influence decisions. I can make do for myself.

But there's one more matter\, one perhaps also a personal one although I might argue that point if pressed. The problem is that I'm trying to converge on Camel4 chapters to hand over to Larry for final editing\, and these things keep moving around out from under me.

  % cd ~/Camel4/pod

  % grep -n -C5 unicore ch05.pod   2834-notation. For example\, C\<\N{U+263B}> means the BLACK SMILING FACE   2835-character. This usage does not require the C\ pragma.   2836-   2837-A list of all Unicode character names can be found in your closest   2838-Unicode standards document\, or in   2839​:R\<PATH_TO_PERLLIB>I\</unicore/NamesList.txt> on your system.   2840-   2841-=item C\<\< \o{R\} >>   2842-   2843-A character specified using its octal code. Unlike the   2844-ambiguous C\<\< \R\ >> notation\, this can be any   --   3213-The Unicode Consortium produces the online resources that turn into the   3214-various files Perl uses in its Unicode implementation. For more about   3215-these files\, see A\<CHP-15>.
  3216-   3217-=for TODO   3218​:Next statement is WRONG! We do not include any HTML in the unicore   3219-directory\, just​:   3220​: /usr/local/lib/perl5/5.12.2/unicore​:   3221- ArabicShaping.txt NamedSqProv.txt   3222- BidiMirroring.txt NamesList.txt   3223- Blocks.txt NormalizationCorrections.txt   3224- CJKRadicals.txt PropList.txt   3225- CaseFolding.txt PropValueAliases.txt   --   3238- Name.pl mktables.lst   3239- NameAliases.txt version   3240- NamedSequences.txt
  3241-   3242-You can get a nice overview of Unicode in   3243​:the document R\<PATH_TO_PERLLIB>I\</unicore/Unicode3.html> where   3244-R\<PATH_TO_PERLLIB> is what is printed out by​:   3245-   3246- perl -V​:privlib   3247-   3248-=for TODO

I've got way too many TODOs in there already. But now it appears I have to add a few more. I guess I have to strip out the reference at line 2839 from ch05.pod now.

I know I have to revamp ch15.pod\, the one on Unicode\, quite a bit. I almost wish I could just hand it to Larry to fix\, since he's who wrote it initially. But I'm more in touch with what's going on with perl5 Unicode bits (as shown by this mail)\, so I should be the one to do that and not make more for him.

So I'd like to know where things stand before we press them onto clay tablets. I feel like every time I turn around\, something wiggles a little. One can avoid this by only talking about high-level stuff\, so that the low-level wiggles aren't visible. But sometimes I like to mentioned details you wouldn't get otherwise\, and those seem to keep moving about.

I'm sure that "seems" is the operative word here\, and that they only appear to change a bunch because I'm not looking at it from the right level of detail and time-space perspective. But I'd like for what I write to be right at least as long as the clay takes to dry. Right now\, I'm not sure I'll make that goal.

Hence also my questioning of unicore files going away​: I'm trying to figure out what to say and how to say it.

--tom

p5pRT commented 13 years ago

From @khwilliamson

Tom Christiansen wrote​:

In-Reply-To​: Message from karl williamson \public@&#8203;khwilliamson\.com of "Fri\, 31 Dec 2010 14​:22​:01 MST." \4D1E4979\.1080401@&#8203;khwilliamson\.com

Tom Christiansen wrote​:

Looking at blead\, it appears that seriously fewer files will be installed under unicore/ than currently are.

Am I misreading this?

I ask because some I rely on seem not to migrate at install time.

I don't know what you're referring to; there have been no very recent changes to this that I'm aware of\, but it's planned to remove most of the .txt files for 5.14 because of requests about disk usage.

I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.

I believe it is intended to go away; it's not used by the Perl core.

You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.

First\, here I believe is the relevant thread​:

Actually that thread is not the relevant one. It's too late at night for me to go looking\, but it had to do with downstream rebundlers thinking about pruning Perl\, or removing it ENTIRELY\, IIRC\, from their default distribution. Given that threat\, this seemed like a better option\, as distasteful as it is to us. I was only peripherally involved in the discussion and decision\, saying that yes the core doesn't use some of these files\, and yes we can figure out a way to remove most of the other big ones.

My response to the bug report was based on my understanding of the plan; which I'll be the one\, most likely\, to implement\, as far as Unicode goes. Other options included\, as I recall\, were to not have Encode generate the large CJK files.

p5pRT commented 13 years ago

From @iabyn

Since all these files are not installed by 5.14.0-RC1\, I'm marking this ticket as resolved

p5pRT commented 13 years ago

@iabyn - Status changed from 'open' to 'resolved'