Closed p5pRT closed 13 years ago
unicore/ contains a lot of apparently unused files:
mktables mktables.lst *.txt auxiliary/* extracted/*
These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.
One of the biggest files\, however:
TestProp.pl
Seem to be part of the testuite only (apparently it only conatins autogenerated tests)\, and surely could be removed?
Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size.
This is considerable\, especially for embedded systems. Please consider not installing these files if possible.
On Thu Dec 09 05:34:30 2010\, perl@plan9.de wrote:
This is a bug report for perl from perl@plan9.de\, generated with the help of perlbug 1.39 running under perl 5.12.2.
----------------------------------------------------------------- [Please describe your issue here]
unicore/ contains a lot of apparently unused files:
mktables mktables.lst *.txt auxiliary/* extracted/*
These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.
One of the biggest files\, however:
TestProp.pl
Seem to be part of the testuite only (apparently it only conatins autogenerated tests)\, and surely could be removed?
Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size.
This is considerable\, especially for embedded systems. Please consider not installing these files if possible.
Removing most of these is planned for 5.14. TestProp.pl is only used for the test suite and can be removed after that is run. mktables and the .txt files are used to generate the tables that the Perl core does use. These are not shipped with distribution; ironically apparently to save space. However\, the tables do take up less space than the source .txt files. However\, some of these .txt files are used by Unicode::UCD. It is planned to change this for 5.14 as well. A few CPAN modules expect those .txt files to be in place\, however.
Building the tables ahead of time would require some changes due to machine word length issues involving numeric precision with the values of the Unicode characters that denote infinitely repeating fractions.
In the meantime\, you can remove TestProp.pl by hand from your machine; and if you don't intend to recompile\, mktables and .lst as well\, or you can remove the all-comment lines from mktables for some space savings without loss of functionality.
The RT System itself - Status changed from 'new' to 'open'
In-Reply-To: Message from karl williamson \public@​khwilliamson\.com of "Fri\, 31 Dec 2010 14:22:01 MST." \4D1E4979\.1080401@​khwilliamson\.com
Tom Christiansen wrote:
Looking at blead\, it appears that seriously fewer files will be installed under unicore/ than currently are.
Am I misreading this?
I ask because some I rely on seem not to migrate at install time.
I don't know what you're referring to; there have been no very recent changes to this that I'm aware of\, but it's planned to remove most of the .txt files for 5.14 because of requests about disk usage.
I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.
I believe it is intended to go away; it's not used by the Perl core.
You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.
First\, here I believe is the relevant thread:
In-Reply-To: Message from "Karl Williamson via RT" \perlbug\-followup@​perl\.org of "Thu\, 30 Dec 2010 21:40:37 PST." \rt\-3\.6\.HEAD\-5425\-1293774036\-542\.80480\-15\-0@​perl\.org
On Thu Dec 09 05:34:30 2010\, perl@plan9.de wrote:
unicore/ contains a lot of apparently unused files:
mktables mktables.lst *.txt auxiliary/* extracted/*
These seem to be unused by perl\, and are apparently just a copy of the unicode data tables.
One of the biggest files\, however:
TestProp.pl
Seem to be part of the testuite only (apparently it only contains autogenerated tests)\, and surely could be removed?
Removing TestProp.pl alone would save 3.5MB(!) of perl installed size on my machine\, and not installing the unicode tables (if indeed unused) would save 11.5MB of installed data size. This is considerable\, especially for embedded systems. Please consider not installing these files if possible.
Removing most of these is planned for 5.14. TestProp.pl is only used for the test suite and can be removed after that is run. mktables and the .txt files are used to generate the tables that the Perl core does use. These are not shipped with distribution; ironically apparently to save space. However\, the tables do take up less space than the source .txt files. However\, some of these .txt files are used by Unicode::UCD. It is planned to change this for 5.14 as well. A few CPAN modules expect those .txt files to be in place\, however.
Building the tables ahead of time would require some changes due to machine word length issues involving numeric precision with the values of the Unicode characters that denote infinitely repeating fractions.
In the meantime\, you can remove TestProp.pl by hand from your machine; and if you don't intend to recompile\, mktables and .lst as well\, or you can remove the all-comment lines from mktables for some space savings without loss of functionality.
Removing the all-comment lines from mktables seems to save ~256k:
% perl -nle 'print unless /^\s*#/' mktables | wc mktables /dev/stdin 14811 76214 613719 mktables 10583 34637 351108 /dev/stdin 25394 110851 964827 total
If we count only the *.txt files in unicore/\, then one *could* always compress them:
6496 utext-orig 1236 utext-gzip 1020 utext-bzip2 660 utext-7za
But to return to the original message:
I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.
I believe it is intended to go away; it's not used by the Perl core. You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.
Just because it isn't used by the Perl core\, doesn't mean it isn't used by things relying on its presence in the Perl core. Put another way\, because it has been *shipped* with the Perl core\, it may have *been being used* by code that relies on the Perl core. One cannot know what code of other people's one will consequently break by stripping files from the core. This is a general concern\, not one restricted to this particular issue.
The program I have that makes use of unicore/NamesList.txt is uninames. I *can* always include that entire files in the \ block\, just as I must include in my uniprops program the whole perluniprops.pod for backwards compat before that file's appearance.
It's preferable to *not* do that\, because there's some usefulness in binding the unistuff you're looking through to the versions of those things that were/are actually used by a particular release of Perl. I won't be able to do that any longer. Even worse\, I no longer automatically upgrade the NamesList when Perl uses an updated one! This creates a maintenance and synchronization problem where one did not previously exist. Previously\, it just updated with Perl\, so existing code automatically got the new version without my having to do anything. Now I do.
Plus\, the \ trick is only reasonable if there is but a
single solitary program that alone uses that particular file.
It becomes awkward if you have more than one program using that
file\, because now you have it in more than one place on your
system\, thereby creating wasted space when one was hoping to
preserve it. This works against the very thing one was
ostensibly trying to address.
I realize that nothing delights downstream rebundlers more than pruning off parts of the Perl distribution. I'd like not to rehash my grievances with what some of the Linux people do with their nomenclature of what's core and what isn't core.
Previously when things were stripped out of the core\, some effort was made to encapsulate these into some CPAN module so that equivalent functionality was still achievable. I don't believe that such a package has been considered for these.
Apart from you\, Karl\, I may be one of the few among us to routinely cd unicore and browse around for interesting bits of knowledge. I am perfectly content to continue doing that with the unicore directory in the Perl src distribution rather than in the installed perllib.
But that doesn't address the other concerns I just mentioned. That's a personal matter\, and probably shouldn't influence decisions. I can make do for myself.
But there's one more matter\, one perhaps also a personal one although I might argue that point if pressed. The problem is that I'm trying to converge on Camel4 chapters to hand over to Larry for final editing\, and these things keep moving around out from under me.
% cd ~/Camel4/pod
% grep -n -C5 unicore ch05.pod
2834-notation. For example\, C\<\N{U+263B}> means the BLACK SMILING FACE
2835-character. This usage does not require the C\
3216-
3217-=for TODO
3218:Next statement is WRONG! We do not include any HTML in the unicore
3219-directory\, just:
3220: /usr/local/lib/perl5/5.12.2/unicore:
3221- ArabicShaping.txt NamedSqProv.txt
3222- BidiMirroring.txt NamesList.txt
3223- Blocks.txt NormalizationCorrections.txt
3224- CJKRadicals.txt PropList.txt
3225- CaseFolding.txt PropValueAliases.txt
--
3238- Name.pl mktables.lst
3239- NameAliases.txt version
3240- NamedSequences.txt
3241-
3242-You can get a nice overview of Unicode in
3243:the document R\<PATH_TO_PERLLIB>I\</unicore/Unicode3.html> where
3244-R\<PATH_TO_PERLLIB> is what is printed out by:
3245-
3246- perl -V:privlib
3247-
3248-=for TODO
I've got way too many TODOs in there already. But now it appears I have to add a few more. I guess I have to strip out the reference at line 2839 from ch05.pod now.
I know I have to revamp ch15.pod\, the one on Unicode\, quite a bit. I almost wish I could just hand it to Larry to fix\, since he's who wrote it initially. But I'm more in touch with what's going on with perl5 Unicode bits (as shown by this mail)\, so I should be the one to do that and not make more for him.
So I'd like to know where things stand before we press them onto clay tablets. I feel like every time I turn around\, something wiggles a little. One can avoid this by only talking about high-level stuff\, so that the low-level wiggles aren't visible. But sometimes I like to mentioned details you wouldn't get otherwise\, and those seem to keep moving about.
I'm sure that "seems" is the operative word here\, and that they only appear to change a bunch because I'm not looking at it from the right level of detail and time-space perspective. But I'd like for what I write to be right at least as long as the clay takes to dry. Right now\, I'm not sure I'll make that goal.
Hence also my questioning of unicore files going away: I'm trying to figure out what to say and how to say it.
--tom
Tom Christiansen wrote:
In-Reply-To: Message from karl williamson \public@​khwilliamson\.com of "Fri\, 31 Dec 2010 14:22:01 MST." \4D1E4979\.1080401@​khwilliamson\.com
Tom Christiansen wrote:
Looking at blead\, it appears that seriously fewer files will be installed under unicore/ than currently are.
Am I misreading this?
I ask because some I rely on seem not to migrate at install time.
I don't know what you're referring to; there have been no very recent changes to this that I'm aware of\, but it's planned to remove most of the .txt files for 5.14 because of requests about disk usage.
I specifically use "$Config{privlib}/unicore/NamesList.txt" which I hope isn't going away.
I believe it is intended to go away; it's not used by the Perl core.
You can search for the thread(s) on the topic\, as to why\, and that one can always download one from Unicode.
First\, here I believe is the relevant thread:
Actually that thread is not the relevant one. It's too late at night for me to go looking\, but it had to do with downstream rebundlers thinking about pruning Perl\, or removing it ENTIRELY\, IIRC\, from their default distribution. Given that threat\, this seemed like a better option\, as distasteful as it is to us. I was only peripherally involved in the discussion and decision\, saying that yes the core doesn't use some of these files\, and yes we can figure out a way to remove most of the other big ones.
My response to the bug report was based on my understanding of the plan; which I'll be the one\, most likely\, to implement\, as far as Unicode goes. Other options included\, as I recall\, were to not have Encode generate the large CJK files.
Since all these files are not installed by 5.14.0-RC1\, I'm marking this ticket as resolved
@iabyn - Status changed from 'open' to 'resolved'
Migrated from rt.perl.org#80480 (status was 'resolved')
Searchable as RT80480$