crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.36k stars 1.62k forks source link

Status of incremental compilation in 2021 #10568

Open UnsolvedCypher opened 3 years ago

UnsolvedCypher commented 3 years ago

Discussion

What aspect of the language would you like to see improved?

I would like to see some form of incremental compilation implemented in Crystal so compilation does not slow down significantly for small changes in large projects. There has been some discussion on this in the past, but I think it would be good to revisit this as it hasn't been touched for a while.

Personally, I really love Crystal's static type checking, and while type inference is also nice, I would be happy to annotate some parts of my code with types if this means it can be compiled incrementally. I am interested in other perspectives from the community so there can be some consensus on what people are willing to do for faster compile times. This does not have to be a change to the language necessarily, I would be happy with the compiler performing incremental compilation if types are specified where needed, or otherwise fall back to regular compilation when type inference is desired everywhere.

What are the reasons?

I am ready to start using Crystal in production, but knowing that compile times could grow significantly as the project grows is very unappealing. This is particularly true for web development where it's common to make a small change and want to try it quickly. Furthermore, IDE tooling could benefit significantly from fast compilation. In my view, tooling that is stable and reliable and quickly catches errors can help newbies get started in Crystal.

Optionally add one (or more) proposals to improve the current situation.

I propose that the community decides where types would be required for incremental compilation. For example, all function parameters must be typed as must all global variables. Then, the compiler could run in incremental mode if types are specified where needed, otherwise it would compile normally. In this case, it would be nice to have a compiler flag that would check if the project is appropriately type-annotated for incremental compilation so that users can tell where they are missing types if they want faster compile times.

bararchy commented 3 years ago

I see people keep bringing this up, but I don't know how painful this issue really is. The slow part is only in --release based compilation, and if you need quick tests you obviously won't use --release, also, we have a full CI/CD pipe using --release and without --release for specific parts and tests and there is no large impact on CI times.

I think this issue is raised mainly by people not actually using Crystal in production, and I can maybe suggest, give it a try, and you might notice compilation times are not that bad without --release for quick changes and tests.

oprypin commented 3 years ago

For example, all function parameters must be typed

Keep in mind that currently method parameters aren't "optionally typed", they cannot be typed, that is simply not implemented. All that the type restrictions do is validate the input type for a match, but it has no effect on the outcome. Crystal will still generate one method body per combination of actual types.

def f(x : Int32 | String)
  p typeof(x)  # Int32  # not (Int32 | String) as you might've wanted
end

f(5)

A related fun fact is that, for any program that currently works, removing all parameter annotations can never make it not compile.

So, currently, type annotations will not get you any advantage in terms of incremental compilation, because the actual types will still be deduced. Might as well try to do it without. Which is kinda counter-productive: imagine, the program needs to look through every type that's ever passed to a method, and only then decide whether it needs to be re-compiled. And that applies to every method, which can be interdependent. So there's probably cycles in that graph.

Changing the language so that the type annotations actually make it so the parameter has only one type would be a huge breaking change, and that's just not on the table (I'm guessing). The challenge here would not only be one of the biggest technical challenges for anything ever done with this language, but also just generally very hard to push through even if it worked, for fear of getting a separate programming language as the end product.


So I think there's nothing new that this proposal brings. Nobody's going to argue that we don't want incremental compilation, so there's also not much need to argue for it. It's all been said in the last ~5 years, and there has also been no change in that regard in the same time frame.

Even the explanation I wrote here is kinda similar to this one, as I realize now: https://forum.crystal-lang.org/t/are-inferred-types-frozen-for-crystal-shards-and-the-standard-library/1434/2


Well, in fact, there was a change in status, and that is the release of version 1.0, which puts a much tougher (in my view, near impossible) restriction on any implementation of incremental compilation: disallowing backwards-incompatible changes. See also: https://forum.crystal-lang.org/t/when-crystal-lang-1-x-will-be-released/1563/20

straight-shoota commented 3 years ago

A related fun fact is that, for any program that currently works, removing all parameter annotations can never make it not compile.

Just a note on that: Without type restrictions on parameters, it would be impossible to disambiguate type-based overloads. If you have those, removing type restrictions would likely break the program.

konovod commented 3 years ago

A related fun fact is that, for any program that currently works, removing all parameter annotations can never make it not compile.

offtopic, but while i totally agree to the rest of post, this one sentence isn't true - there is function overloading. So

def foo(a : String)
  p a+"!"
end

def foo(a : Int32)
  p a / 2
end

foo(1)
foo("a")
foo(rand < 1.5 ? "a" : 1)

https://carc.in/#/r/apk1 won't compile if you remove restrictions.

oprypin commented 3 years ago

Oops sorry about that

UnsolvedCypher commented 3 years ago

@oprypin Thank you for your explanation.

imagine, the program needs to look through every type that's ever passed to a method, and only then decide whether it needs to be re-compiled. And that applies to every method, which can be interdependent. So there's probably cycles in that graph.

My understanding is that it is the looking through every type ever passed to a method that makes incremental compilation problematic, is that correct? If that's true, could the compiler not just compile the function for the specified parameter types, and then throw an error if it encounters a call that's passing in a different type? Please correct me if I'm wrong, but to me it seems like this would limit the scope of where the compiler has to look in order to infer types and thus also to decide whether the file needs to be recompiled (if file is unchanged, no need to recompile).

Well, in fact, there was a change in status, and that is the release of version 1.0, which puts a much tougher (in my view, near impossible) restriction on any implementation of incremental compilation: disallowing backwards-incompatible changes.

Yes, I certainly don't think a breaking change would be a good idea. What I am suggesting is to allow users to have their code conform to slightly more strict typing requirements to enable incremental compilation rather than to change what is allowed in the language.

@bararchy I have primarily used Crystal for web development, and I have to say, Lucky is by far the nicest web framework I've had the pleasure to use. What is unfortunate is everything around the framework- the slow reload times and the slow IDE tooling. This was a pain point during development even without the --release flag.

oprypin commented 3 years ago

just compile the function for the specified parameter types

Specified where exactly?

You specify : Foo, I make a new file containing class Bar < Foo. So you're saying there should be an error then?

asterite commented 3 years ago

I would just like to remind you that mandatory type signatures won't solve this issue.

See these:

I probably talked about this in more places.

j8r commented 3 years ago

I believe one could experiment with it, as a subset of the language with several features removed. That's definitely not a light work. Will most people prefer a more limited language with incremental compilation vs what we have now? Not sure.

UnsolvedCypher commented 3 years ago

Thank you @asterite those links were very helpful in understanding the problem further. In this issue, you mentioned that the following restriction would allow for incremental compilation:

When declaring a type (class, struct) you have to specify its instance variables and their types. The same applies to class variables and global variables.

Is that still the case, or have things changed since then?

oprypin commented 3 years ago

@UnsolvedCypher That was a super huge blocker for incremental compilation, and it is gone, but turns out there are still big blockers (the ones mentioned here) and it's not doable.

n-pn commented 3 years ago

I think we should only care about the compilation speed in the development mode. Many of us are familiar with the short feedback loop provided by other scripting languages, so waiting 30s or more for crystal to recompile a medium size project in dev mod is unacceptable.

I have a small project with just 5k lines of code and use minimal shards (kemal, kemal-session and myhtml), and yet the compilation time is already more than 5s, this is really worrisome.

rubyFeedback commented 3 years ago

bararchy commented Apr 1, 2021

I see people keep bringing this up, but I don't know how painful this issue really is.

Well, I am a crystal-noob right now, though I know quite a lot of ruby.

The ~recent 1.0 release garnered some attention I would assume.

You can then read on reddit that for larger files compiling crystal slows down things quite a bit. At the least some accounts state so.

As to whether this is correct or not, I can not say; I don't have any experience with larger crystal code yet. I am still in the find-out stage.

But I think merely based on the PERCEPTION, one could see whether there is truth to these statements. This then, in turn, taps into "incremental compilation" mostly because I think it originates from that use case (e. g. developers who may become impatient waiting for the compiler in crystal to finish).

nipinium commented Apr 2, 2021:

I have a small project with just 5k lines of code and use minimal shards (kemal, kemal-session and myhtml), and yet the compilation time is already more than 5s, this is really worrisome.

I have no experience to share myself, so I have to rely on others. But this is something that keeps on being written on various areas including reddit. I think this is not solely the issue of what the crystal devs may think, but simply how others perceive. Look at ruby - it got faster from 1.8.x to 3.0 but people still refer to 1.8.x days in many speed comparison discussions. Perception issues do happen to exist. :P

straight-shoota commented 3 years ago

Yes, it's a bit of a perception issue. The Crystal compiler is comparatively slow. That's necessary because it does a lot of work. I think everyone interested in Crystal accepts that general trade-off in general. But it obviously begs the question: How slow is it? How does it affect the user experience? And that's a very difficult question. There is no answer for everyone. Each use case and every developer have different expectations and requirements. It's obviously one of the most criticized aspects of Crystal that users perceive it as too slow. So a lot of people talk about it. But what's talked on Reddit doesn't really represent the active user base. Happy users rarely dive into Reddit threads to argue that it works for them. They're probably busy coding Crystal. Those who perceive it as too slow, are heard more. And their stories repel new users. If people on Reddit say it's too slow for real-world use cases, that has an effect. Even if that assessment only fits for their specific use case and priorities. Maybe it would be a good idea to talk about these aspects in success stories. Developers who use Crystal in production can certainly tell a thing or two about how their development process works and what effect they experience with compilation times. And obviously, we can focus on the good parts of Crystal. You know, the Compiler does that hard work so that devs don't have to.

UnsolvedCypher commented 3 years ago

Would the following restrictions solve the problem?

If so, this seems like a reasonable compromise for running the compiler in incremental mode. To be clear I'm not suggesting a change to the language, just that incremental compilation would only work if the restrictions are met, otherwise compilation would take place normally. I think it's great that Crystal has the expressiveness that it does, and it is very cool that you can subclass methods to different types and the compiler will figure it all out.

However, I think these restrictions would generally be followed by large codebases anyways since unpredictable function return types and subclass types can make a large codebase difficult to reason about. I think letting users opt-in to incremental compilation is the best of both worlds- small projects can use Crystal to its full extent and large projects can add some types here and there to keep compilation fast and to keep the code sane.

oprypin commented 3 years ago

Would the following restrictions solve the problem?

No.

UnsolvedCypher commented 3 years ago

No.

Would you mind providing a counter-example so I can understand the problem?

asterite commented 3 years ago

Check one of the links I shared. The problem is that dependencies of a method aren't clear.

Fryguy commented 3 years ago

I've read both of the links multiple times and I still don't really understand the problem. In general, to many of us outside of the compiler who want to help (or at least help in coming up with a plan), it is unclear what the obstacle actually is.

The main issue is that it’s not immediately clear what are all the dependencies of a given LLVM bc/ll file and how to compute that. That is the main challenge involving incremental compilation.

Is it that the dependencies of a method are not currently collected or not possible to collect, or that the associations are lost somewhere along the way? For me, personally, I don't understand the current relationship between a .cr file and the LLVM bc/ll, and what is lost or missing. That we get a compiled result at all implies to me that at some point the compiler knows everything, and that sort of implies it can be stored and compared in a subsequent compilation.


macros definitely complicate things, I admit. I think for those we need a way to know if a macro is deterministic or not, or if conditionally deterministic, a simple way to know if that condition has changed. Based on the below, maybe we can assume they are deterministic except in a few specific cases that perhaps can be annotated?

asterite commented 3 years ago

Let's take this program:

# foo.cr
require "./bar"

class Foo
  def foo
    1
  end
end

Bar.bar
# bar.cr
class Bar
  def self.bar(foo)
    puts foo.foo
  end
end

I guess the compiler could see that Bar.bar is called with a Foo, so it can say that it depends on Foo, and specifically on Foo#foo. Then it also depends on the top-level puts.

Now I write another program:

# file baz.cr
require "./bar"

class Baz
  def foo
    "hello"
  end
end

Bar.bar(Baz.new)

Okay, this is a separate program, baz.cr. The compiler sees that Bar.bar depends on Baz, and specifically on Baz#foo.

As you can see, Bar can't be compiled independently from the files that call it. In every other compiled language, Bar can be compiled independently from the files that call it. This is simply not true in Crystal.

But, let's say we say "We'll do incremental compilation as long as the entrypoint is always the same." That is, we cache dependencies of a previous compilation per entrypoint.

Let's go back to the original foo.cr, which was this:

# foo.cr
require "./bar"

class Foo
  def foo
    1
  end
end

Bar.bar

Let's add this:

# still in foo.cr
class Bar
  def self.puts(x)
     ::puts "Hello: #{x}"
  end
end

Oh! Now the puts call in Bar will not call the top-level anymore. It will call Bar.puts. But... none of Bar's dependencies changed! Well, I guess for every top-level call that we find, we must also include that same type (and actually all the type hierarchy) just in case someone defines a method with the name of a top-level method.

But this can be done, no problem. It just seems that we have to track more and more information, it's getting harder.

Let's continue... let's say we now add this to foo.cr:

class Fox(T) < Foo
  def initialize(@x : T)
  end

  def foo
    @x.foo
  end
end

Oh! Bar depended on Foo, and now it's subclassed, so I guess we need to depend on Fox(T) now. Okay, no problem... we still have this method foo on it, which is of type... hmmm... we don't know? It depends on the type of T? What's T? Can we know it? No. Well, the compiler actually knows it, at the end of the compilation: it's every generic type you instantiated. But those instantiations happen inside methods! Or can happen inside methods, at least. So the compiler has to analyze the entire program to figure that out. But... wasn't the whole point of incremental compilation to avoid having to look at the entire program?

Well, that's the problem.

Of course one can say "Well, in this case it's bad, but maybe programs aren't like that, or we can have a workaround like this," but you can't program a compiler around workarounds or just hope people won't code in certain ways.

Maybe we can ask that T is restricted somehow, so you'll write:

class Fox(T) < Foo where T < SomeOtherType
end

That way we can know what SomeOtherType is... though we'll have to know what's the type of SomeOtherType#foo is. So we can make that mandatory:

class SomeOtherType
  def foo : Int32 # This is now mandatory in all methods, it can't be omitted!

Oh, but look at this Enumerable method sum:

[1, 2, 3].sum # => 6
['a', 'b', 'c'].sum # Compile error!

So sum is actually defined on all types... but it only compiles on certain types. I guess we'd have to change the language to be able to define sum just when T has some properties...

You can see where this is going. The simplicity of Crystal will be gone to the trash.

Fryguy commented 3 years ago

@asterite First, thanks so much for the detailed reply!

I think what's confusing to me about the example there is that you continually change foo.cr. In an incremental model, everything in foo.cr would get recompiled anyway because that file is what changed. In general, the changes you keep making to foo.cr are to change the nature of or relationships to Bar (i.e. a class initially defined in a separate file), showing that bar.cr would need to be recompiled. However, I fully expect it to be recompiled, so that's not surprising. If a relationship exists in metadata somewhere that both foo.cr and bar.cr manipulate or depend on the the Bar class, then when when the Bar class is changed you'd also naturally have to recompile both foo.cr and bar.cr. What you wouldn't have to recompile is the dozens to hundreds of other files in your entire application stack.

But... wasn't the whole point of incremental compilation to avoid having to look at the entire program?

No. The whole point of incremental compilation is to filter down the potential list of things to compile to only those things that the user affected via their changes, thereby avoiding recompiling what has already been compiled. If the user changes something that globally affects class hierarchies, then I'd expect the entire program (or a large part of it that's related to those class hierarchies) to have to be recompiled. If the user changes puts "fo" to puts "foo", I'd expect it to reuse everything it already knew before, which in this case is just about everything as even the argument types didn't change.

didactic-drunk commented 3 years ago

I don't think incremental compilation needs to be perfect. If half a program is suitable for incremental compilation how much time is saved? Maybe some classes or methods won't qualify such as those @asterite mentioned. I don't see why many or most methods can't qualify for incremental compilation. Example:

def add(x : Int, y : Int) : Int
  x + y
end

@asterite's example is somewhat synthetic. Yes the language may allow overriding top level methods, but let's be real. How often does that happen? By shard percentage of top x hundred shards please. Instead, can you flag methods/classes/whatever as unsuitable for incremental compilation if it's hard or unsolvable?

Rather than complete incremental compilation maybe we should incrementally implement incremental compilation.

Stage 1:

Stage 2:

Stage n:

didactic-drunk commented 3 years ago

One minor optimization that's probably obvious to the core team but possibly not to others attempting to do incremental compilation: throw all uncachable code in it's own cached unit. If code changes only affect a specific cached unit the "uncachable" cached portion is still valid.

al6x commented 1 year ago

Hi, I saw recently demo of Lobster language https://www.youtube.com/watch?v=uuPeBKdnBOI

To my understanding it's very close to Crystal, different syntax but similar semantics, and same powerful auto infer not requiring explicit types. And it compiles almost instantly (see video), maybe Crystal may adopt some of its technic to speed up compilation.

femto commented 8 months ago

what's the status of this now? I copied my propose here:

every file can be compiled into a compilation unit of bytecode, let's call it .crgb, (.crc well is crc...), so I choose to use .crb here which means crystal bytecode, then .crgb means crystal general bytecode. the gb is kind of same of *.cr, it only includes the prototype of function.

#sum.cr
def sum(a,b)
  a+b
end
then let's say foo.cr depends on sum.cr
#foo.cr
sum(1,2)

crystal foo.cr, the compiler process will fetch foo.{cr,crgb} and sum.crgb to produce final result, no need to recompile sum.cr. when sum(1,2), compiler realize it needs to instantiate the function to a sum(Int32,Int32) instance, it will do it, whether it choose to save the bytecode to sum.crb that's an another choice.

#foo.cr
sum("a","b")

crystal foo.cr, the compiler process will fetch foo.{cr,crgb} and sum.crgb to produce final result, no need to recompile sum.cr. when sum(1,2), compiler realize it needs to instantiate the function to a sum(String,String) instance, it will do it.

step 1 --> *.crgb is already performance improvement and I think can already handle very large projects.

step 2--> .crb is problematic because we don't know beforehand all the instantiations, there may be some optimization in the future to clever knows a .crb doesn't needs to recompile. for now let's just only compile to *.crgb

AlexMcConnell commented 3 months ago

Is this still not added? Dang, I really liked this language, but I abandoned learning it a few years ago because of this. Any language where I can't run my unit tests at an appropriate speed for using them to validate incremental changes is the opposite of developer friendly.