Closed Akirathan closed 7 months ago
Let's create directory compiler
next to existing semantic and put the benchmark there. It will then automatically appear at https://enso-org.github.io/engine-benchmark-results/engine-benchs.html
Let the benchmark generate traditional end user code:
from Standard.Base import all
main =
operator1 = File.read "blabla"
operator2 = operator1.xyz 2 where=Location.Start
operator3 = operator1.abc "Hi3"
If we can generate such code, then we can have a benchmark for 100, thousand and ten thousand line file and compare the scalability of our implementations.
What shall we measure? We want to measure creation of the IR and applying compiler Passes
to it - probably how long it takes to invoke Compiler.run method. However that method requires an implementation of CompilerContext.Module
which isn't easy to get. One way is to mock it, but probably easier to just org.graalvm.polyglot.Context.eval("enso", ....)
and get a reference to main
method (without invoking it). That shall be simpler (as the API already exists) and good enough to begin with.
Let the benchmark generate traditional end user code:
from Standard.Base import all main = operator1 = File.read "blabla" operator2 = operator1.xyz 2 where=Location.Start operator3 = operator1.abc "Hi3"
If we can generate such code, then we can have a benchmark for 100, thousand and ten thousand line file and compare the scalability of our implementations.
Won't this hide the issue of various levels of complexity? I.e. a 10 line function with lots of variables and dependencies may be more complex to analyze (especially if something in it were O(N^2)) than 10 independent 1-2 line functions that are very trivial.
Maybe we could try using our Standard.Base
library as the 'corpus' for the benchmarks? It should contain methods of varying levels of complexity and is probably the best 'example' we can currently get of a big codebase in Enso that uses various kinds of patterns.
What do you think?
@radeusgd I think that the proposal from @JaroslavTulach makes more sense for now, as it resembles more closely what is actually parsed and compiled in the IDE.
Besides, when you import from Standard.Base import all
, all the modules that are transitively reachable are compiled. So I am not sure I follow your reasoning here.
Besides, when you import
from Standard.Base import all
, all the modules that are transitively reachable are compiled. So I am not sure I follow your reasoning here.
🤦 oh I have somehow completely missed that. Then indeed my suggestion is moot, you are 100% right.
Well I guess what still stands is - I don't think we should be generating sources by multiplying the 3 line example multiple (10s, 100s, 1000s) times.
Because then the timing will get saturated by the time to parse these simple 3 lines, instead of the time needed to compile Standard.Base
- which I imagine is much more complicated to compile and provides a better benchmark of practical usage.
Maybe both are worth measuring though.
Maybe we could try using our
Standard.Base
library as the 'corpus' for the benchmarks?
Standard.Base
is a moving target not really suitable for a "unit benchmark" of the compilerruntime/compiler
or because of changes in Standard.Base
?complex to analyze (especially if something in it were O(N^2))
The point of having files of various sizes is exactly to identify complexity of our algorithms! We don't want O(N^2)
algorithms at places where speed matters.
10 independent 1-2 line functions that are very trivial.
There can be a benchmark that generates 10, 100, 1000 simple functions as well. That checks scalability from another angle.
I don't think we should be generating sources by multiplying
The important goal is to have the benchmarking in place, run some benchmarks, collect the results. And, most importantly, make it easy to add new benchmarks to the system when a new performance problem is found.
Pavel Marek reports a new STANDUP for today (2024-02-23):
Progress: - Created some sensible categories for benchmarks
Pavel Marek reports a new STANDUP for today (2024-02-26):
Progress: - Adding more benchmarks, ensuring that only the compiler is measured. It should be finished by 2024-03-01.
Pavel Marek reports a new STANDUP for today (2024-02-27):
Progress: - First batch of benchmarks is ready for review. It should be finished by 2024-03-01.
Pavel Marek reports a new STANDUP for today (2024-02-29):
Progress: - Quick fix for failing benchmark builds at https://github.com/enso-org/enso/pull/9220
Pavel Marek reports a new STANDUP for today (2024-03-01):
Progress: - Initial look at benchmark regressions
There are no benchmarks for parsing or compiling. Let's add benchmarks to
runtime-compiler
or/andruntime-parser
projects. Ideally, make sure that these benchmarks are visible in https://enso-org.github.io/engine-benchmark-results/engine-benchs.htmlWill be good for #7054