Compilation time - Githubissues

ryzhyk commented 5 years ago

I know I've asked this before, and ultimately the problem is in the Rust compiler, but given the growing community of people using differential I was wondering if there is some accumulated collective wisdom for dealing with the slow compilation problem, e.g., some tricks or best practices people use.

frankmcsherry commented 5 years ago

Hi Leo,

Let me throw my two cents in, and perhaps others (e.g. @comnik) can chime in as well.

There is for sure an issues with compilation, much of which to my experience lives between Rust and LLVM codegen: Rust deals with monomorphisation by dropping a very large amount of work on LLVM. The remedies to this are probably out of our reach, short of learning more about what costs so much and perhaps dialing back some optimization opportunities (e.g. more boxed traits rather than generic parameters).

Intemediate representation

What I bet is the most promising in the near term is what the Clockworks folks have done with 3DF, what the interactive project does, and what Materialize is doing: create what is in effect an IR and translate much of your query to that, as well as the compiled code to generate dataflow from IR. As an example, interactive presumes a general data type V and believes all collections will have a Vec<V>. All operators---joins, distincts, filters---are parameterized by data (indices) rather than Rust closures. This means that they get compiled once, and can now deal with arbitrarily complicated queries without any further codegen.

This approach has some limits: it isn't going to produce the fastest code, nor the most memory efficient code (you would rather have a [V; 3] or a (usize, String) than a Vec of some enum of general types). But it is perhaps good enough for 95% of your query, and if there is an expensive computational core that could be compiled.

Factoring and re-use

If you find that you have identical code fragments that are re-used a lot, Rust would almost certainly be happier if you created a named method for them and re-used it rather than re-generating code for each of them. My sense is that a lot of what LLVM has to do is code deduplication, because most of the types and methods of the numerous joins are all roughly identical. I don't know your code all that well, but if you have such opportunities this might help.

ryzhyk commented 5 years ago

Thanks, Frank!

I am also using the IR approach with all collections containing a single type V and a data structure that describes the dataflow and that is interpreted by the engine at runtime. I do not store Vec<V> in collections, but rather just V, which means that V is a gigantic enum of all types stored in any of the collections. Trouble is, despite all this decoupling, as the enum grows, compilation becomes slow. I am still trying to nail down what parts of my code make things slow, but it almost definitely has to do with static dispatch and rustc+clang being aggressive at trying to optimize things globally.

Frankly, at this point I am a fan of using dynamic dispatch in differential as long as the performance implications are mild. Compilation times are having surprisingly profound impact on the entire development process, e.g., it is nearly impossible to have a large test suite, since it will run forever. And of course the fact that compiling a bigger DDlog program is taking 6 minutes does not help with adoption :)

Having said that, I am currently trying to gradually strip down my code to figure out what parts contribute to long compilation times. For instance, I found that commenting out all the join code does not make much difference, hence at least that particular type of duplication is not significantly contributing to compilation times.

I also understand that Rust folks are trying to make the compiler faster by moving more optimizations to MIR. Perhaps one day this will magically solve my problem :)

frankmcsherry commented 5 years ago

Depending on your energy levels, the Rust folks have indicated an appetite for improving compiler performance, but were lacking specific actionable metrics. It sounded like they could plausibly respond better to people who showed up with specifics about slowness, potential remedies, etc. I think they get a lot of "it's slow wtf!" and less detailed set-ups with clear "fruit" that could be low- or medium-hanging.

I can try and put you in touch with the rust folks doing this. The link I use to get there is

https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile

though it may be that this has some access control on it (it is the Rust zulip, and the subproject related to self profiling).

comnik commented 5 years ago

I have little to contribute here, I fear. Definitely a problem. I am of course a fan of the declarative approach. This does ease the pain for test suites, we usually define test cases in data and then have a single test function executing them all (e.g. here). However there have been times where, seemingly overnight, compile times jumped up.

Having never taken the time to investigate root causes so far, that is all I have on this topic...

If either of you has theories I am happy to try things.

TimelyDataflow / differential-dataflow

Compilation time #178

Intemediate representation

Factoring and re-use