dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

A lot of time and memory required to generate the D parser. #98

Closed chadjoan closed 11 years ago

chadjoan commented 11 years ago

When I try to compile the parser generated by the grammar in pegged.examples.dgrammar it tends to make my system do a big lurch and then fail with an out-of-memory error. I have about 765 megs of RAM free when I do this. I can make it succeed by closing firefox and freeing up most of my remaining memory (gives me almost 3 GB total), but my build times still crawl and development is inconvenient without a browser with adblock/noscript/etc. I would expect 765 megs to be easily sufficient for compiling a recursive descent parser.

This is not with mixins. I am outputting the parser to a separate file and then compiling that.

I have tried with and without memoization. The parser tends to be about 4k to 10k LoC, depending on memoization. Both easily OOM on me.

My suspicion is that the heavy use of templates is doing this. dmd is normally a pretty efficient compiler, but it's always struggled speed-wise and memory-wise on the metaprogramming stuff.

Here is my memory measurement when Firefox is running, which is a condition for OOM and a lower-bound on how much is required for compiling the D parser:

chad@Hugin /mnt/bulk/dprojects/xdc $ free -m
             total       used       free     shared    buffers     cached
Mem:          3896       3130        765          0         18        289
-/+ buffers/cache:       2823       1073
Swap:            0          0          0

Here is my memory measurement when Firefox is closed, which is a condition in which I can build the parser, albeit slowly. This is an upper-bound for the memory required for compiling the D parser:

chad@Hugin /mnt/bulk/dprojects/xdc $ free -m
             total       used       free     shared    buffers     cached
Mem:          3896        989       2906          0         26        304
-/+ buffers/cache:        658       3238
Swap:            0          0          0
callumenator commented 11 years ago

What compiler version are you using? I have had troubles in the past compiling the D parser, but it goes OK currently on windows with git head (as of a few weeks ago). Granted I probably had a gig or two of ram free. Another problem I ran into was with the Keywords rule giving a stack overflow when compiling, but I got around that by splitting into two rules:

Keyword <- 'blah' / 'blah' / Keyword2 
Keyword2 <- 'more' / 'blah'

Just some thoughts

chadjoan commented 11 years ago

The gig or two of ram may have saved you.

I'm using a dmd somewhere around commit #f3d5843fcb52600ddc0edcc2b04aa86ce48bfab1 with added hacks to make it produce running executables on my machine and also provide __ctfewriteln(...). I think it's slightly older than 2.060. My last commit is on Jul 29 while 2.060 was released on Aug 2.

I'm running on Gentoo Linux.

I could see someone saying that it is "fast" given that it takes only a few seconds. Still, that's slower than "instantaneous" which is what I'm used to with using dmd on small projects. I worry about scale-up on that, though the running out of RAM is the thing that's actually really hazardous to my workflow right now.