crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.43k stars 1.62k forks source link

Why the compiler is slow and consumes a lot of memory #4864

Closed asterite closed 7 years ago

asterite commented 7 years ago

So, I tried to reduce the memory consumed by the compiler. The first thing that came to my mind was to clear up bindings between nodes after the semantic phase and the call GC.collect. A binding between AST nodes is this: if a node x is bound to a node y, whenever y's type changes x's type is updated. This is used in the type inference algorithm.

It turns out that by doing this I was able to free about 50MB of the total of 1GB consumed by compiling the compiler. This is insignificant and will not help reduce memory usage. My hope was to reduce the memory consumed before the codegen phase so no more memory is needed to be allocated.

So, the next thing is seeing what the compiler is doing with our code. Why isn't more memory freed by calling GC.collect? And why is so much memory consumed? The compiler isn't that big after all.

To investigate this, I added some quick and dirty code to show what's going on. To understand the output we need to understand how the compiler works. It's not hard.When we have a method like this...

def double(x)
  x * x
end

... Crystal doesn't compile anything yet. Only when you invoke the method it will instantiate it with the argument types. So if you invoke double(1) a method double<Int32> will be instantiated and typed (for method calls inside that method we do the same). If you then invoke double(1.5) a method double<Float64> will be instantiated, and so on. We call each of this an "instantiation" of a method, similar to how you instantiate a generic type with concrete types, because every method in Crystal is like a "generic" method. A instantiation means cloning the AST nodes for the method and type-checking them, and keeping this type-check result around until codegen (this consumes memory that can't be freed).

This gist shows the list of all types and methods instantiated by compiling the compiler. Now we can understand why so much memory is consumed:

There's a lot of redundant information and processing done here. This not only results in bigger compile times and memory consumption, but also in code bloat in the generated executable.

This is not a rant: I'm explaining this so the team at Manas, core contributors, and anyone who reads this can brainstorm ideas to improve this situation.

Some ideas I have in mind are:

1: When we have Foo < Bar with Foo#foo and we invoke Bar#foo, turn the receiver into Foo+# and use that. Similarly if we have Foo#foo: turn it into Foo+#foo.

Now, this is a breaking change, because:

class Foo
  def foo
    1
  end
end

class Bar < Foo
  def foo
    "hello"
  end
end

Foo.new.foo # => 1 (Int32)
Bar.new.foo # => "hello" (String)

# But with this change:
Foo.new.foo # => 1 (Int32 | String)
Bar.new.foo # => "hello" (Int32 | String)

Basically, invoking foo will always result in a dispatch and in the broader type. Maybe it's not a big issue.

  1. When a method argument has a type restriction, make that restriction be the argument type.

This means that if I write def foo(x : Foo) and I pass Bar < Foo, Bar would be cast-up to Foo+. Or, well, following change 1 maybe there won't be a difference anymore between Foo and Foo+.

That way the method foo will only be instantiated once.

Like change 1, this is a breaking change for more or less the same reasons.

  1. Require mandatory types in all method arguments. Generic methods need to use free variables.

So for example we can't write this anymore:

def double(x)
  x * x
end

We must write it like this:

def double(x : T) forall T
  x * x
end

Or choose one type (or a union type, etc.):

def double(x : Int32)
  x * x
end

There doesn't seem to be a point to this, because we can just use forall all over the place. The idea would be to document that generic methods like that will be instantiated once per argument type and that they will slow down compilation and bloat the generated executable, and to always prefer to write an explicit type if known. We remove the possibility to write def double(x) because it's to easy to write so everybody will use that instead of thinking the correct type.

Because the norm would be to have method arguments typed, we can type-check them without waiting for someone to invoke them! This means we can give errors before a method is invoked. This means providing IDE support for autocompletion or showing the type of a local variable is a piece of cake (or, well, much easier, and much faster).

We can still use some level of duck-typing with forall. For example, I'd like IO#<<(obj) to be generic so I can append any object to it, at the cost of the method being instantiated once per object, because the method definition is very short (just two lines).

Another idea would be to keep things just like now, and provide a tool (like the diff above) that would inform of methods that are instantiated too many times and could benefit from adding a type restriction. But ideally, just like with instance variables, it would be nice if the norm were to write the types down.


I don't have a solution for reducing the amount of generic type instantiations without complicating the type system a lot (like, becoming Java, C# or Rust), and without requiring mandatory return types, though mandatory return types are also nice to improve compile times, memory usage and understanding of the code. And of course all of this could help to implement incremental/modular compilation (though I'm not sure).


Another overall "solution" is to just say "Well, the language is awesome (it is) but the cost is slow compile times, a lot of memory consumption compiling code, and code bloat". That's fine too, if we accept that limitation.

akzhan commented 7 years ago

Restrictions over argument and return types looks for me like good deal for compile times and predictability/readability of code.

It doesn't touch feature of union types.

konovod commented 7 years ago

I really like idea (2). Argument type being not actual type but just a restriction was always counterintuitive for me (e.g. my first issue #2977 appeared partially due to this misunderstanding (I thought that if i pass Enumerable and it compiles, than it should work for any Enumerable)) and I've seen some people on reddit complaining about it. I think this idea is least "breaking" and strictly improves the language - user still have a choice, either write def double(x) and get nice type inference (perfect for small scripts) or specify type and get better compilation speed\memory and also compiler check of interface matching implementation (ideal for big programs). Will it be enough (if all possible methods in compiler\stdlib will have types restricted)? Because other two could be more controversial (idea 1 means that Bar.clone will return Object+? And idea 3 means every code written so far will be broken and copying from ruby won't be so easy too).

classyPimp commented 7 years ago

Mandatory return types are not bad at all. Type unions really deliver the "dynamicness" feel to crystal. I guess people interested in Crystal not because it's Ruby that runs fast, but a static language that has a sexy syntax of Ruby. I personally type everything in my Crystal scripts including arguments and value the safety and readability it gives more than the performance. P.S. welcome back to crystal @asterite without you Crystal felt stalled.

vegai commented 7 years ago

Mandatory argument and return types for methods seems like a good choice, and not only for performance reasons.

akzhan commented 7 years ago

Modern C++ has type inference for return types.

It's required because some types are hard to hand write (or can be inferred only on compile phase).

RX14 commented 7 years ago

@asterite First of all, thank you for performing some research and confirming what we all thought: the largest problem with scaling Crystal is the combinatorial expansion of types * method arguments.

Second of all, I want to point out early in this discussion that the Crystal compiler and language works fantastically well for a lot of people as-is. I mainly code small, self-contained, useful services in crystal which solve small problems. These are an absolute dream to code in Crystal because you can write your entire application with very few type restrictions and take full advantage of the power of the crystal compiler. And they all compile in around (or less than) a second. I don't want this to change, not only because I want to continue coding my applications without thinking too much about types, but because beginners do this too. I think we should be careful that any changes we make don't drive away newcomers by making the language feel too rigid and "Java-like".

In addition to that, the fact that crystal works fantastically well for most people as-is gives us a lot of time to get any changes we make to enable Crystal not to go exponential in large apps exactly right. We shouldn't rush into any large-scale changes to the language without being sure that it's the right decision.


I believe that solution 2 (and perhaps some small non-breaking optimizations around generics) is very powerful and will solve this problem with almost no major changes to the "feel" of the language. My hypothesis is that modules (parts of the stdlib, shards, etc.) will naturally form strongly typed interfaces between themselves simply by documenting their API. Providing type annotations on a public API is something that should be done already: it makes the documentation much easier to read, and it creates better errors. By annotating a method's type signature you not only give useful documentation to yourself and anyone else using the library, you tell the compiler enough information - most of the time - to instantiate your method only once.

The problem right now, is that the compiler can't take advantage of that information, because it can only do it "most of the time". Arguments are not cast to their type restriction when they arrive in a method. This allows you to write code where a method will only correctly compile for some subset of the types which are actually covered by the type restriction. This is genuinely confusing behaviour which we should consider changing whether or not it gives us a massive performance improvement (and I think it gives us a massive one). By changing the semantic such that we essentially call as(TypeRestriction) every time we pass an argument which has a type restriction, we can take full advantage of this information and reduce compile times.

However, what we shouldn't do is enforce that such type restrictions exist. It makes the language much less flexible, and in my view is entirely unnecessary. If the public API of a logical "module" of code is annotated with type restrictions (and it doesn't even need to be 100% coverage, only 90%), we still vastly reduce the number of possible method instantiations on the external API. This in turn vastly reduces the possible number of ways internal methods without any type restrictions are instantiated. Instead of a single hunk of code which exhibits O(types * method arguments) number of instantiations, we end up with many smaller modules, each of which has O(types * method arguments) complexity inside it, but also has vastly fewer types and method arguments. Between these modules exists a strongly typed API which is both good for the compiler, and good for the human reading the documentation.

I believe this method preserves all that I love about crystal, the dynamic and exciting coding style, while solving the large-program problem. All while strongly encouraging good documentation by tying it to compile-time performance.

straight-shoota commented 7 years ago

My hypothesis is that modules (parts of the stdlib, shards, etc.) will naturally form strongly typed interfaces between themselves simply by documenting their API. Providing type annotations on a public API is something that should be done already: it makes the documentation much easier to read, and it creates better errors.

That's what I was thinking about as well: Especially for libraries it is good practice to specify argument and return types, because they are usually expecting something specific. All-over ducktyping can only go so far as there are no instance variables involved.

Even in stdlib it's sometimes difficult to figure out what return type you can expect from a method or what type an argument can be without looking at the source code. In some cases it is actually documented in the doc text, but that information should really be in the code. It helps so much to understand exactly what you can do with a method.

RX14 commented 7 years ago

Even in stdlib it's sometimes difficult to figure out what return type you can expect from a method or what type an argument can be without looking at the source code.

That's because the stdlib is badly documented :)

ysbaddaden commented 7 years ago

Well:

  1. having a descendant class method return another type than a super method feels like wrong design; maybe an acceptable tradeoff?

  2. strict type constraints... well, that sounds just like how it should be; that will break a few cases, for example collection : Enumerable then assume collection[] to be defined (reminiscent of #4010), but I believe these are abusing the... bug?

  3. require typing methods arguments and return values... it definitely kills the "never have to type anything" goal, trades developer happiness for the sake of the compiler —the opposite of Ruby's hackability & sexyness.

  4. Crystal is awesome!

I wonder if instead of memoizing a duplicated method body over and over again, it would only be done to determine args/return types, then let it be collected, only memoizing the args/return type pairs (which could be simplified), then eventually regenerate the method body during the LLVM IR code generation (or not, if types didn't changed since the last compilation).

refi64 commented 7 years ago

Option 1 doesn't seem bad to me; subclasses need to return compatible types anyway. Here's my question: in this case:

class Super; end
class Sub < Super; end

class Foo
  def meth
    Super.new
  end
end

class Bar
  def meth
    Sub.new
  end
end

Bar.new.meth  # Super or Sub?

If it's Super, wouldn't that be kind of annoying when dealing with tree-like types?

Option 2 is great. IMO it should've been this way from the beginning.

Option 3...I feel like this kills one of Crystal's main points: the ability to prototype easily and throw on type annotations when you need to. I think maybe it'd be worth it first checking how the language composes anyway.

kazzkiq commented 7 years ago

But how does this "compiling inefficiency" translate in numbers?

Is the memory consumption something that grows exponentially with the project, to the point it will consume infeasible amounts of RAM?

I've written half a dozen "micro" projects in Crystal (the larger has only about 2k lines of code), and I've never experienced that much RAM usage in compiling.

My question is straight to the point: If a project reaches 10, 20, 50, 100k LOC will the memory consumption follow it linearly (or even stabilize after a certain amount) or there could be a point where Crystal would require simply too much memory (hardware-deterrent) to be acceptable?

If the memory consumption is not something that will eventually reach blocking hardware usage as the project grows, I honestly don't see an issue at all. If you want to run your project binaries on a low-end machine (less than ~1GB RAM), you can always replicate the production environment on a build machine and then port the binary to production. In fact, you could even use Vultr/DigitalOcean VMs to do the build and simply kill their VM later, it would cost just a penny per production build.

sdogruyol commented 7 years ago

I think @bararchy can give us a great feedback about a really big Crystal project 👍 @kazzkiq

konovod commented 7 years ago

@kazzkiq Right now many PR's to crystal fails on CI because apparently CI machines has not enough memory to compile all specs. There is a partial solution to this (split specs by two parts), but it shows that memory consumption isn't going to stabilize. For smaller projects of course compilation time is more important - and it's not very good too (at least comparing to languages like Rust or D).

bararchy commented 7 years ago

So, my two cents (and use case)

As opposed to the majority of Crystal users who usually make micro-services, or webapps, I use Crystal for Machine learning and Data analysis, My projects get big, and fast, where the usual size is 40K lines and more.

Compiling takes time, and memory, especially when using --release mode. Compiling takes: 1.2Gb to compile, which is quite absurd, think about trying a few different build at the same time while also trying to use the same server for tests and runs,

akzhan commented 7 years ago

Compiler can follow two separate modes (like JavaScript with "use strict" pragma does).

First and default one allows global type inference. Second - doesn't allow.

It's common practice to distinguish requirements to smalls apps, libraries and big projects using pragmata etc.

unleashy commented 7 years ago

@akzhan That could work, and could be applied to specific methods instead of all methods? But then again, it's basically forall T but the reverse.

I think option 2 is best, and it's how it works in essentially any sane statically compiled language, and it's how it should've been from the start. Option 3 is fine for me too, but it would most definitely break a lot of stuff and be too "rigid" compared to now I guess, but I don't have too much of a problem with that, coming from C, C++ and Java.

ghost commented 7 years ago

@akzhan maybe something like https://github.com/crystal-lang/crystal/issues/4603?

akzhan commented 7 years ago

@totakaro yes, it is. but enabled by some pragma in source code instead of compiler option.

There are lot of reasons to setup compiler behavior by source code. And it's already used widely by programming languages.

See ruby's pragmata to utf-8 string, frozen string literals, perl's utf8, strict, javascript's 'use strict' etc.

RX14 commented 7 years ago

@akzhan most of those are hacks to be able to remove unwanted and unneeded behaviour while keeping backwards compatability. If those languages had been well designed in the first place such hacks wouldn't be required or wanted. We don't have the shackles of requiring backwards compatability so we shouldn't have multiple versions of the crystal language.

faustinoaq commented 7 years ago

@RX14 sure, We don't want to repeat Python 2 vs 3 story.

akzhan commented 7 years ago

@RX14 anyway we can't provide good shards/modules and large apps without strictures on type inference.

No way.

And I should note that's it's not backward compatibility question.

It's required because we have two different target auditories.

faustinoaq commented 7 years ago

@akzhan Always exists a way to solve things :smile:

akzhan commented 7 years ago

@faustinoaq yes, it is. something may be solved by precompiling ASTs (parser phase).

But type inference should be simplified just because it's now hard for compiler and not yet uniquely.

RX14 commented 7 years ago

we can't provide good shards/modules and large apps without strictures on type inference.

I don't follow at all. I believe we should enable the compiler to take advantage of annotated types to compile much faster, but that the trade off between compile time and annotating types should be entirely up to the user. I do not believe we should force anything on anyone. We don't have to: the practicalities around compile times will force them to if their shard gets popular enough.

asterite commented 7 years ago

I tried to work on this but failed miserably. I won't have time to play with this, but if someone has, I suggest to clean up and refractor the compiler code because it's a bit messy.

It also changes the semantics of the language so it's a breaking change.

ozra commented 7 years ago

Thanks for the report @asterite!

I've been thinking about this one in the back of my head for some time, though I haven't touched the code in months.

My baseline of thought has been: technical solution, not spec changing solution.

Idea has mainly been to separate the "pure" AST-nodes and the AST-node annotations / semantics (typing). An idea is that each instantiation of a func (for example) is a linear list of annotation-nodes. The original syntax-nodes which vary gets an index upon first traversal. The instance specific type-info is referenced via the syntax-nodes index into the specific instantiation (this is obviously done through a get-annotation-obj method). Have experimented a little bit with this. But the ideas should be seen as vapor, I haven't yet looked into how well it could be black-boxed enough to the majority of the inference and generation code. This is indexes-as-sort-of-relative-pointers, it means more dereferencing, thus more cycles used. And, I frankly have no idea what savings it would yield in practice - perhaps even the similar measly 50MB :-/

The thoughts are also founded in the wish to be able to come up with a way of keeping the program structure cached in a compilation server (as has also been discussed before) which obviously requires keeping its' space requirements down. To mutate only parts of trees affected by source changes. If that was achieved, higher cycle cost is offset by limiting the extent of work.

Those are all big ifs. And much work. And much added complexity to the compiler.

I routinely type restrict or use explicit generics for params and return type always, since catching errors at function boundaries leads to much faster development and much less bugs. But -- the reason I love the "implicit genericness" in Crystal vs. having to template it explicitly in C++ is that you can start off with PoC-code and algorithms and even script-like code first: quickly throwing it away, quickly rewriting, and then tighten it up when happy with the concept.

Writing entire programs without typing func params/ret is very bad practice imo (with no benefits whatsoever from what I've seen in studies and my own work). But it still should be fast XD

RX14 commented 7 years ago

@asterite does that mean you think solution 2 is the way to go? Do you have any more comments/ideas about the ideas in my comment?

Regarding the compiler change, it would be great if you could give a high-level on the kind of refactoring or other work required. To my uninitiated mind (I haven't worked on the typing code in the compiler) its as simple as cast all the params to a function which are restricted and attempt to cache instantiations. What am I missing?

asterite commented 7 years ago

@RX14 This comment explains one of the difficulties of implementing this: method restrictions aren't typed immediately, they are typed over and over when trying to match a method call. So it's impossible to "cast the arguments to the restrictions" because when you match an argument to a method, the type that you get is the restricted one: the original one, dictated by the parameter, is lost in the matching. This gets more complicated with union types, and also with things like:

def foo(x : Enumerable)
end

which right now work because x will get the type of the argument at the call site, but with this change it will be a combination explosion of all possible Enumerable types (in fact, that can't be represented yet by the compiler so it won't work).

Maybe this small details needs to be discussed, decided, and tackled, and slowly improve the language specification and its implementation, tackling (and thinking about) cases like the above.

RX14 commented 7 years ago

Hmm would be nice if we can find a robust way to do this without forward declaration (perhaps have it be deferred to a later pass where we have all types)

Personally I'm a bit scared of the compiler, I haven't ever submitted patches to the complex parts, and find it hard to hold such things in my head. I don't think a large scale refactor is a good way to get started working on the compiler.

unleashy commented 7 years ago

method restrictions aren't typed immediately, they are typed over and over when trying to match a method call.

@asterite What do you mean by this?

asterite commented 7 years ago

@unleashy Compile this:

def foo(x : IDontExist)
end

It works, no errors. You only get an error when you try to invoke foo.

A solution that would also allow forward declarations would be to type methods after all types and methods are "added" to the hierarchy.

unleashy commented 7 years ago

@asterite So you mean type restrictions are checked similarly to a template? That's pretty odd. I think type restrictions realized like that should be their own "thing" and not be templates in disguise, and have them statically type-checked at compile-time, after all the types and methods are resolved as you said.

ghost commented 7 years ago

I'm a bit confused reading these comments. Is this issue one of the big breaking changes before 1.0?

Article "Crystal new year resolutions for 2017: 1.0" third point is Type system.

I hope we find the way to improve Crystal. :-)

Crystal is awesome!

shelvacu commented 7 years ago

Is there some way to keep the efficiency gained by generating specific methods? In any of the three options you propose, we lose the performance gain of compiling specifically for Float64, and specifically for Int32. Is this a significant performance benefit?


@asterite said:

It's bad because if T is a reference type (represented as a pointer) then the generated code, at least for an Array or a Hash, is the same, so there's no point to do this.

So why not detect that it's the same and point to a previous piece of generated code? It's the same or possibly longer computation time, but should use less memory right? Is it feasible?

RX14 commented 7 years ago

Is there some way to keep the efficiency gained by generating specific methods? In any of the three options you propose, we lose the performance gain of compiling specifically for Float64, and specifically for Int32. Is this a significant performance benefit?

We should be able to keep the old codegen as an option for release mode.

faustinoaq commented 7 years ago

Maybe :point_down: will be true again :sweat_smile:

compiling...

https://xkcd.com/303/

faustinoaq commented 7 years ago

Unfortunately, there are limits on this kind of type inference. It looks like their closures require type annotations, for example, and they've reported pretty serious performance issues with compiling projects greater than about 40k LOC right now. This isn't to criticize the project, which is doing good work, but these kinds of limitations are well understood by PL theorists. My impression is that they haven't really engaged much with that work. [0]

So, Can we ask a theorist to review Crystal? or Did I do a bad question? :sweat_smile:

[0] - https://www.reddit.com/r/rust/comments/6yngn0/love_from_crystal_developers/

asterite commented 7 years ago

I'm closing this as I just wanted to express the problems with the current approach. There are things to try and improve, but there's no point keeping this open as it's not an issue.

akzhan commented 7 years ago

I want something like Crystal, but with explicitly typed contracts (member definitions).

Absence of stricture on these declarations looks like barrier between Crystal and industry.

asterite commented 7 years ago

Indeed. I personally would like to see type annotations in all method arguments and return types, and many in the core team from Manas too. So the language might still change drastically.

paulcsmith commented 7 years ago

FWIW I love that instance variables have to be typed, but you don't need return types for methods. That inference is really nice, but I understand why some want to force type annotations. Just throwing in my vote to keep at as-is (optional)

faustinoaq commented 7 years ago

I'm okay with Types everywhere if it helps to compile faster 😅

ansarizafar commented 7 years ago

I second @paulcsmith typed instance variables but optional return types for methods.

RX14 commented 7 years ago

If you want crystal with strictly typed contracts, generate a tool which enforces that on your code. It may well make your code better and compile faster, but I don't think we should enforce that on the entire language. I'd certainly not use crystal if that was the case.

RX14 commented 7 years ago

I think we should have a new issue (perhaps in the RFCs repo) which tracks the proposal of making x : Foo in a method definition cast x to Foo.

faustinoaq commented 7 years ago

Could crystal be faster if we implement a JIT compiler using some kind of Crystal bytecode?

I mean, the slowest phase of current compilation process is the codegen phase.

Maybe we can use JIT for fast development and AOT for fast production/execution ? Would be the best of both worlds, no?

Can be possible to implement a JIT compiler for Crystal using LLVM-IR ?

I writing this comment as inspiration on Mono documentation, they do something similar with .Net [0]

[0] - http://www.cbs.dtu.dk/cgi-bin/nph-runsafe?man=mono

refi64 commented 7 years ago

But isn't the most memory-intensive part the type inference?

Also, a JIT built on LLVM would still likely use a ton of memory, and making one from scratch would probably be a massive time sink.

.NET and Crystal are two very different beasts...

ghost commented 7 years ago

Yeah, you're right I think I was a bit confused +1

On Tue, Sep 26, 2017 at 7:42 PM, Ryan Gonzalez notifications@github.com wrote:

But isn't the most memory-intensive part the type inference?

Also, a JIT built on LLVM would still likely use a ton of memory, and making one from scratch would probably be a massive time sink.

.NET and Crystal are two very different beasts...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/4864#issuecomment-332375533, or mute the thread https://github.com/notifications/unsubscribe-auth/AdPTIeLQqgF65b4oPnK1_A6bsbRB5dd0ks5smZntgaJpZM4O803v .

vegai commented 7 years ago

How bad is it, though? Scala has a famously slow and resource-hungry compiler, but I still see companies use it left and right.

The compiler itself is > 100k but compiles in release mode in acceptable time (5-10 minutes or so). Are you wishing for >1M LOC projects to be viable in Crystal? I'm not sure I see why that's something to strive for.

All that said, less resource usage is better obviously, and forcing stricter typing in some places would make code quality higher as well. Would it be possible to have the slow full type inferencer engine enabled when compiling in debug mode but disable it when building release=1? That would kinda give the best of all worlds in my mind: developer freedom when coding in storming mode, and faster compilation and stricter requirements for code quality when moving to production.

lribeiro commented 7 years ago

Can't we have both?

Specify the types and compiler is blazing fast, don't and wait for the type inference do it's thing?

I really like being able not to strictly define types while prototyping stuff, and then make things more specific as code moves towards 1.0.