Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 560 forks source link

CvFILE on threaded, and gv_fetchfile on all perls, burn mad memory #14725

Open p5pRT opened 9 years ago

p5pRT commented 9 years ago

Migrated from rt.perl.org#125296 (status was 'open')

Searchable as RT125296$

toddr commented 4 years ago

@bulk88 @iabyn None of this seems to have made it into blead. Are we continuing to pursue this?

bulk88 commented 4 years ago

Dave Mitchell disagrees with the new COW type being storable inside SVs because "I dont want to see HV code changed (because bulk88's primarily filename COW type is a derivative of a shared hash key infrastructure) because dont fully understand it myself and Ill understand it even less if there are modifications to it". Dave wants, if I understand correct, that the filename buffers, if they are exposed in PP, they will be copied into new SV as 1 off malloc buffers, rather than any attempt at COW or many SVs and many not PP exposed structs hold refcnt ownership on the same unique filename malloc buffer. My argument is basing the COWed filename strings off SVs directly is impossible because SVs can't encode any more COW or non-1 to 1 malloced string buffers. I decided on some of the struct names but I am bad at naming structs and wanted someone to agree or disagree on my suggested struct names. Since nobody but me and Dave ever read this ticket, or understands perl guts to comment on it, it stalled. I should finish it one day, its is the fastest to implement and largest memory win in the interp I can think of off the top of my head, other than dropping a member from a core high frequency used struct (automatic 4/8/16 byte win). I stopped working on it, because I didn't understand myself the impact the patch would have on B::CC and cperl and a thought I had for a long time of writing out CV optrees to memory mapped files (B::CC is really badly designed, it generates C files which must be compiled with a platform specific CC, rather than dumping optree linked lists straight to disk with a pointer relocation list) and I didn't understand myself if the SHEK/CHEK struct format can be stored in RO memory or not. AFAIK any attempts to memory map file PP subs/optrees to disk files would require non random hash seeds for largest shared between perl procs memory win. Otherwise all the hash es stored on disk have to have new hash keys written into HEKs and linked lists and HV arrays reoraganized on "serialized module load" into a perl proc.

toddr commented 4 years ago

Much of what you describe related to serialization is already available via cPanelā€™s B::C https://github.com/cpanel/perl-compiler. It is not currently available on CPAN primarily because it requires an unthreaded perl with several patches to Perl related to memory. Both of these can be overcome if there is interest from someone besides cPanel.

At this point we have made the strings for sheks static but all hash arrays have to be allocated and initialized on startup in order for hash randomization to work correctly.

Our conclusion at this point is that B::CC will never be achievable without essentially re-implementing libperl. What would be more useful is if a module at CHECK did peephole-ish optimizations to the op tree before it was handed to B::C. It could take as long as it likes since the code is serialized for rapid re-run later. This would also be super useful for daemonized processes like catalyst, dancer, mojo, etc. who donā€™t care (within reason) how slow a process is to compile.

We are available on IRC if you would like to learn more about what has already been done for B::C

bulk88 commented 3 weeks ago

https://github.com/Perl/perl5/commit/6760f691a95ab3a37fd59212795de2b1a7cf7888

my guess is that demerphq commit fixed 80% or 90% of memory bloat vs this branch that fixes 100%, <demerphq code has better API names vs everything proposed in this ticket.

But the demerphq is very cut down (size/new lines/features) vs this branch. In this branch I record hash numbers, for later passing to the gv_fetch*() APIs, his API is missing that.