akeep / nanopass-framework

The new nanopass framework; an embedded DSL for writing compilers in Scheme
MIT License
321 stars 42 forks source link

Nanopass uses an excessive amount of memory #4

Closed eholk closed 7 years ago

eholk commented 11 years ago

I'm seeing the Harlan compiler regularly use about 3GB of memory to do all the Nanopass macro expansions. This is kind of a lot, and it also means we have to use the 64-bit version of Petite or Chez to compile programs. Is there a way to significantly reduce this?

eholk commented 11 years ago

Here's a related issue for Harlan: https://github.com/eholk/harlan/issues/99

akeep commented 11 years ago

I've taken a quick look at this, and this seems to be something that is peculiar to petite. When I run harlanc with petite it seems to max out over 3G. When I run it with the full compiler (note: not pre-compiled) I am only seeing about 500 MB at the maximum.

I'll try to look into what is leading to this, but I must admit, it seems a little strange. Do you have Harlan running on the IU linux machines? (I think it might be interesting to do the same test there with petite vs. full Chez and see if we see the same behavior.)

-andy:)

akeep commented 11 years ago

So, after looking at this a bit closer, it seems that most of the memory is consumed when we are creating the all of the record types for the language. (Each language has a record type for the language, a record type for each nonterminal, and a record type for each S-expression production.) Currently, each record is created using define-record-type, along with a checking constructor, and indirect exports of the record accessors.

Anyway, long story short. I think it might be possible to generate the record-type-descriptors here, and build the record accessors when we need them, or build the accessor to pull all the record fields at once. This is going to take some work to accomplish, so it might take me a few days to get it all working.

eholk commented 11 years ago

I was looking into this some with @rrnewton the other day, and we found that if we used full Chez to compile nanopass.ss then the Harlan compiler would run efficiently even under Petite. Would it be possible to make binaries of Nanopass available?

akeep commented 11 years ago

I have added pre-compiled versions of the library that can be used for petite on the 32-bit and 64-bit Intel/AMD platform for Linux and Mac OS X on both threaded and non-threaded versions of Chez. (I'm not sure why my response to this email didn't show up here). I'm also getting closer to finishing up a change that, I hope, will help to reduce the memory needed when languages are created. This change has caused me to get into some of the dark corners of the define-pass code, which has taken me a little more time then I had hoped to fix up. Even if this doesn't help the memory, it is still a step in the right direction, and I have some other tricks up my sleeve that may help.

rrnewton commented 11 years ago

That sounds like a reasonable solution -- the Harlan build process will just pull down .so's from some predefined URL to get nanopass?

I wonder if we should do the same thing when we (finally) switch P423 to nanopass next semester.

akeep commented 11 years ago

Maybe. I've not tried running Harlan with the pre-compiled binaries to see if that helps on the memory. Part of the core problem is that when the languages are expanded many records are created (even in some cases where we only need parts of the record to be defined or where the record only acts to provide a step in the inheritance hierarchy). Since we have the information we need to create the records through the procedural interface at expand time, I've changed to code to do that, but I've also altered the code to have a single procedure that accesses all of the record elements at once through a procedure. Effectively where we would have created an accessor procedure for each record field and in the pass generated the code:

(let* ([x (language-record-x rec)]
       [y (language-record-y rec)]
       [z (language-record-z rec)]
       ---)
  expr
  expr* ...)

I am now creating a single accessor that grabs all the fields at once and producing code that looks like:

(language-record-get-all rec
  (lambda (x y z ---)
    expr
    expr* ...)))

But this requires a bit of a change to the generating code because the old method mixed in accessing fields, checking types, and matching sub-patterns, so we have to change slightly how we build the pass.

Anyway, hopefully these changes will help I've pushed this through part of the way into the define-pass code, but there is still a bit left to do.

Also, I'm thinking of adding a define-unparser syntax that should allow us to replace the wrapper code with more sophisticated unparsers, which I think are a better solution.

eholk commented 11 years ago

I just modified Harlan to use the precompiled nanopass binaries when possible. It seems to have shaved about 4 seconds off the compile time, but I'm sadly still getting about 3.4 gigs of memory usage.

akeep commented 11 years ago

Yep. As I mentioned in the update to Ryan, I don't think pre-compiling is necessarily going to fix this particular problem. It all seems related to the record creation in the library code. I'm close to having a fix in for this. Hopefully, I'll be able to get it in and tested by the end of this coming weekend, and hopefully those changes will help.

eholk commented 11 years ago

Yeah, I guess I didn't expect this to make much of a difference. Thanks for adding the binaries though! I'm looking forward to your next changes.

On Wed, Sep 4, 2013 at 12:55 PM, Andy Keep notifications@github.com wrote:

Yep. As I mentioned in the update to Ryan, I don't think pre-compiling is necessarily going to fix this particular problem. It all seems related to the record creation in the library code. I'm close to having a fix in for this. Hopefully, I'll be able to get it in and tested by the end of this coming weekend, and hopefully those changes will help.

— Reply to this email directly or view it on GitHubhttps://github.com/akeep/nanopass-framework/issues/4#issuecomment-23805598 .

eholk commented 11 years ago

@akeep - Out of curiosity, how many forms does Chez have in its intermediate languages? I just added another production to the beginning Harlan language and now my compile time is 15-30s longer than before. Is the nanopass code generation step perhaps exponential in the number of productions and Harlan just has an overly large set of productions?

eholk commented 11 years ago

Oops, false alarm. I turned off debugging spew and my timings were back to normal. I forgot how much that slows things down.

akeep commented 11 years ago

I have pushed a new version of the nanopass framework that no longer uses define-record-type to create the records associated with languages.

This change seems to slightly reduce both the memory high-water mark and the overall compile time. That said, it still may not be enough to fix the problem you are seeing in harlan.

akeep commented 11 years ago

Eric: In response to your question about Chez. There are 35 intermediate languages in the Chez Scheme compiler used across 51 passes. Since I last looked at it, the compiler has gotten faster, it can now compile its own source code in less than 10 seconds for 32-bit and less than 15 seconds for 64-bit. I don't believe the expansion of the nanopass framework is exponential for two reasons. First, the most expensive part of the process is building the language records, and this is inherently linear in the number of fields of the productions, since it is creating one accessor for each field of a production. Second, I would expect the pass generation code to be the expensive part, if it was exponential somewhere, and if this were the case I would expect Chez Scheme to be much slower to compile, since we have so many passes.

eholk commented 11 years ago

Thanks for these changes! I just tried them in Harlan, and they made a significant improvement. Before, compilation took about 33 seconds, and around 4 gigs of RAM (I forgot to pay close attention). After these changes, compilation time is down to 26 seconds, and memory usage seems to top out at about 2.5 gigs.

Incidentally, I accidentally made a grammar the other day where that had productions like A -> B and then B -> A, and this appeared to use unbounded amounts of memory (I killed the program off when it got up to 6 gigs). I'm guessing this is a different issue entirely, but it makes me wonder if there are ways to design grammars to be easily handled by Nanopass. After all, Chez is significantly more complex that Harlan at this point and it performs far better.

akeep commented 11 years ago

Glad these changes helped. I agree there is more I'd like to do.

As far as designing languages, in Chez Scheme we tried to keep the number forms relatively minimal. So, we have the case-lambda, letrec, and letrec* for binding forms, but no let (since this is direct application of case-lambda), we represent some "special forms" in terms of primitives. For instance, the multiple return values code, call/cc, and other forms that might be considered "special" are represented as primitives. Things like and, or, not, and cond have all been reduced to if forms, we no longer have a one armed if, just a two-armed if. Things like when and unless are also represented as two-armed if with void in the other arm.

I noticed in harlan you have both forms of if, you have many forms like, make-vector, unsafe-vector-ref, unsafe-vec-ptr, length, print, etc. that I might represent as just primitive calls. The other thing to note is that you have forms like (int i) (float f) (bool b), etc. but these could simply be i, f, b, etc. because the nanopass framework knows how to identify these as different because you have provided the terminals, and if you eliminate the extra wrapper, no record will be created for these. In pattern matching you can still say simply ,i ,f or ,b and you will only get integer, float, or boolean matches, so you can keep the types separate if that is important.

I also noticed that you have several languages with nonterminals that are not reachable from the entry nonterminal, that last a few languages more then they need to. For each form in each of these languages a new record is created for each new language, whether you ever use it or not. (I noticed in at least one case, you were using the form to create a new record and discarding the record without ever using it---this was the ClosureGroup in the output of remove-lambdas which is not needed). I have a pruned version of this from the public github, but I don't know if you've moved on from that on your working copy, if not I can fork and make a pull request for you.

On this last point I added another tool (and fixed an existing one) to help out. I fixed the prune-language form so that it is now functioning correctly, where it wasn't before. I also added a define-pruned-language form so that I could more easily compare the languages. Here was my process:

(import (harlan middle languages) (nanopass)) (define-pruned-language M3 pruned-M3) (diff-languages M3 pruned-M3) => shows what can be removed from M3 to get to pruned-M3 which has removed unreachable terminals and nonterminals

Then I cleaned up the language definitions based on this. I think this will also help to make things faster and use a little less memory. I did find a couple of places (in remove-lambdas) where I needed to do a little clean-up, but it was pretty minimal, and the make succeeded once I cleaned those up.

Finally on the mutually recursive languages, I've added a new issue for that, but I have a question for you. Do you think it is worthwhile to support mutually recursive nonterminals or do you think this should raise an error. (The latter is an easier fix to make, and I cannot think of a good example of where mutually recursive nonterminals is useful, but that doesn't mean it isn't useful, just that I don't have a use for it ;)

Anyway, let me know what you think. -andy:)

eholk commented 11 years ago

We had the literals in Harlan tagged originally to distinguish them in the match-based compiler, and particularly when we were using miniKanren for type inference. It's probably possible at this point to use untagged literals and use guards instead, since we aren't using miniKanren anymore.

We can probably trim out a lot of the primitive forms as well, by maybe adding a prim-proc terminal or something like that. It'd probably make the compiler more maintainable too. Right now adding a new primitive is rather tedious, which leads me to find uglier ways of implementing things just to avoid adding new primitives.

eholk commented 10 years ago

I wonder if some of the code generation size comes from creating parsers. Because Harlan weaves in and out of nanopass, we have about 6 parsers so far, and will probably add a few more as we transition more of the compiler over to Nanopass. This may be why our memory usage behavior is so different from Chez's. Fortunately, if over time we can rewrite the whole compiler in Nanopass, we won't need these extra parsers.

eholk commented 10 years ago

I just split Harlan's languages.scm file in two and the memory usage went down dramatically. Instead of taking 30-60 seconds to compile a file, it's down to 18.

Further splitting this file will probably make things even faster. It seems like it's probably better to put one pass per file, with its associated languages, rather than all the languages in one file like I had been doing.

akeep commented 7 years ago

Issue moved to new repository: nanopass/nanopass-framework-scheme

jasonknight commented 6 years ago

I am unable to build due to this issue, it seems it hasn't been fixed. Is there no better way to compile this than on a system with > 3GB of memory?

eholk commented 6 years ago

Out of curiosity, are you running Petite or full Chez Scheme? Chez Scheme is open source now, so it's worth giving a try if you are having trouble with Petite.

jasonknight commented 6 years ago

I am running ChezScheme. A simple solution to the problem user side is to spool up a Virtual Machine with the same OS as the target server (in my case Debian 9 amd64) and make sure it has > 3G of ram assigned, build, then rsync to the server and finally run make install. Worked like a charm.

The only issue is that the target server has less memory than needed to compile scheme.

Now I can use Scheme on my servers, which makes me a happy bunny.