enso-org / enso

Hybrid visual and textual functional programming.
https://enso.org
Apache License 2.0
7.36k stars 324 forks source link

Add benchmarks to `runtime-compiler` #8419

Closed Akirathan closed 7 months ago

Akirathan commented 10 months ago

There are no benchmarks for parsing or compiling. Let's add benchmarks to runtime-compiler or/and runtime-parser projects. Ideally, make sure that these benchmarks are visible in https://enso-org.github.io/engine-benchmark-results/engine-benchs.html

Will be good for #7054

JaroslavTulach commented 9 months ago

Let's create directory compiler next to existing semantic and put the benchmark there. It will then automatically appear at https://enso-org.github.io/engine-benchmark-results/engine-benchs.html

engine benchmark results
JaroslavTulach commented 9 months ago

Let the benchmark generate traditional end user code:

from Standard.Base import all

main =
    operator1 = File.read "blabla"
    operator2 = operator1.xyz 2 where=Location.Start
    operator3 = operator1.abc "Hi3"

If we can generate such code, then we can have a benchmark for 100, thousand and ten thousand line file and compare the scalability of our implementations.

JaroslavTulach commented 9 months ago

What shall we measure? We want to measure creation of the IR and applying compiler Passes to it - probably how long it takes to invoke Compiler.run method. However that method requires an implementation of CompilerContext.Module which isn't easy to get. One way is to mock it, but probably easier to just org.graalvm.polyglot.Context.eval("enso", ....) and get a reference to main method (without invoking it). That shall be simpler (as the API already exists) and good enough to begin with.

radeusgd commented 8 months ago

Let the benchmark generate traditional end user code:

from Standard.Base import all

main =
    operator1 = File.read "blabla"
    operator2 = operator1.xyz 2 where=Location.Start
    operator3 = operator1.abc "Hi3"

If we can generate such code, then we can have a benchmark for 100, thousand and ten thousand line file and compare the scalability of our implementations.

Won't this hide the issue of various levels of complexity? I.e. a 10 line function with lots of variables and dependencies may be more complex to analyze (especially if something in it were O(N^2)) than 10 independent 1-2 line functions that are very trivial.

Maybe we could try using our Standard.Base library as the 'corpus' for the benchmarks? It should contain methods of varying levels of complexity and is probably the best 'example' we can currently get of a big codebase in Enso that uses various kinds of patterns.

What do you think?

Akirathan commented 8 months ago

@radeusgd I think that the proposal from @JaroslavTulach makes more sense for now, as it resembles more closely what is actually parsed and compiled in the IDE.

Besides, when you import from Standard.Base import all, all the modules that are transitively reachable are compiled. So I am not sure I follow your reasoning here.

radeusgd commented 8 months ago

Besides, when you import from Standard.Base import all, all the modules that are transitively reachable are compiled. So I am not sure I follow your reasoning here.

🤦 oh I have somehow completely missed that. Then indeed my suggestion is moot, you are 100% right.

radeusgd commented 8 months ago

Well I guess what still stands is - I don't think we should be generating sources by multiplying the 3 line example multiple (10s, 100s, 1000s) times.

Because then the timing will get saturated by the time to parse these simple 3 lines, instead of the time needed to compile Standard.Base - which I imagine is much more complicated to compile and provides a better benchmark of practical usage.

Maybe both are worth measuring though.

JaroslavTulach commented 8 months ago

Maybe we could try using our Standard.Base library as the 'corpus' for the benchmarks?

complex to analyze (especially if something in it were O(N^2))

The point of having files of various sizes is exactly to identify complexity of our algorithms! We don't want O(N^2) algorithms at places where speed matters.

10 independent 1-2 line functions that are very trivial.

There can be a benchmark that generates 10, 100, 1000 simple functions as well. That checks scalability from another angle.

I don't think we should be generating sources by multiplying

The important goal is to have the benchmarking in place, run some benchmarks, collect the results. And, most importantly, make it easy to add new benchmarks to the system when a new performance problem is found.

enso-bot[bot] commented 7 months ago

Pavel Marek reports a new STANDUP for today (2024-02-23):

Progress: - Created some sensible categories for benchmarks

enso-bot[bot] commented 7 months ago

Pavel Marek reports a new STANDUP for today (2024-02-26):

Progress: - Adding more benchmarks, ensuring that only the compiler is measured. It should be finished by 2024-03-01.

enso-bot[bot] commented 7 months ago

Pavel Marek reports a new STANDUP for today (2024-02-27):

Progress: - First batch of benchmarks is ready for review. It should be finished by 2024-03-01.

enso-bot[bot] commented 7 months ago

Pavel Marek reports a new STANDUP for today (2024-02-29):

Progress: - Quick fix for failing benchmark builds at https://github.com/enso-org/enso/pull/9220

enso-bot[bot] commented 7 months ago

Pavel Marek reports a new STANDUP for today (2024-03-01):

Progress: - Initial look at benchmark regressions