enso-org / enso

Hybrid visual and textual functional programming.
https://ensoanalytics.com
Apache License 2.0
7.36k stars 324 forks source link

Speed `Standard.Base` initialization in simple Hello World example up! #6100

Closed JaroslavTulach closed 11 months ago

JaroslavTulach commented 1 year ago

6062 introduced a testing infrastructure to allow us verify consistency of our IR caches more reliably. Now there is a time to use it and deliver some caching improvements. Enable additional check:

enso$ git diff
diff --git engine/runtime/src/test/java/org/enso/compiler/SerdeCompilerTest.java engine/runtime/src/test/java/org/enso/compiler/SerdeCompilerTest.java
index 5eb17f12d5..afcaa4b906 100644
--- engine/runtime/src/test/java/org/enso/compiler/SerdeCompilerTest.java
+++ engine/runtime/src/test/java/org/enso/compiler/SerdeCompilerTest.java
@@ -31,7 +31,7 @@ public class SerdeCompilerTest {
   @Test
   public void testFibTest() throws Exception {
     var testName = "Fib_Test";
-    final String forbiddenMessage = null; // "Parsing module [local.Fib_Test.Arith].";
+    final String forbiddenMessage = "Parsing module [local.Fib_Test.Arith].";
     parseSerializedModule(testName, forbiddenMessage);
   }

and make sure the .ir file for Arith module isn't read by storing the necessary information in caches.

JaroslavTulach commented 1 year ago

The original title of this issue was _"Avoid loading .ir for module [local.FibTest.Arith] when just importing" but that doesn't seem to be catchy enough and moreover it doesn't express the real problem - e.g. that simple:

import Standard.Base.IO
main = [ IO, "Hello World!" ]

or main = IO.println <| "Hello World!" takes too much time!

enso-bot[bot] commented 12 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-10-25):

Progress: - replacing getMetadata(BindingsAnalysis) with Context.getBindingsMap()

Next Day: Speeding up startup

enso-bot[bot] commented 12 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-10-26):

Progress: - runImportsAndExportsResolution takes just a few milliseconds now: https://github.com/enso-org/enso/pull/8160

Next Day: Speeding up startup

enso-bot[bot] commented 12 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-10-27):

Progress: - Fixes and CI fighting and addressing comments in: https://github.com/enso-org/enso/pull/8160

Next Day: Speeding up startup

Google Docs: Sign-in
Access Google Docs with a personal Google account or Google Workspace account (for business use).
JaroslavTulach commented 12 months ago

8160 helps to speed up the runImportsExportsResolution phase, but overall it doesn't have any huge speed impact as it the megabytes of .ir caches are still loaded in later.

Idea

One way to address this is to split IR caches into structure and method bodies as method bodies aren't really needed until the method gets executed. That can be done with a replaceObject in ModuleCache:

diff --git engine/runtime/src/main/java/org/enso/compiler/ModuleCache.java engine/runtime/src/main/java/org/enso/compiler/ModuleCache.java
index c23790b688..b4122edece 100644
--- engine/runtime/src/main/java/org/enso/compiler/ModuleCache.java
+++ engine/runtime/src/main/java/org/enso/compiler/ModuleCache.java
@@ -161,6 +166,9 @@ public final class ModuleCache extends Cache<ModuleCache.CachedModule, ModuleCac

         @Override
         protected Object replaceObject(Object obj) throws IOException {
+          if (obj instanceof Expression) {
+            return null;
+          }
           if (obj instanceof UUID) {
             return null;
           }

This change lowers the size of body-less caches to 40%:

1,5M    640K    Standard/AWS
29M     13M     Standard/Base
28K     28K     Standard/Builtins
19M     7,4M    Standard/Database
19M     7,7M    Standard/Table
1,5M    652K    Standard/Visualization

If we read only the body-less part first, we should shorten the initial time to 40%. Only then we would incrementally (as methods get executed) load the remaining parts.

Implementation

Either we can have two .ir files - one body-less one full or we can place the Expression parts at the end of the stream. Following changes in the serialization stream would do:

private Map<Integer,Expression> pendingExpressions;
private int counter
...

if (obj instanceof Expression exp && counter >= 0) {
   var refExpr = new RefExpression(++counter);
   pendingExpressions.put(counter, exp);
   return refExpr;
}

store stream.writeObject(entry.moduleIR()); while replacing each Expression with just a delayed reference and then:

stream.counter = -1; // no more expression replaces
for (var entry : pendingExpressions.entrySet()) {
  stream.writeInt(entry.getKey());
  stream.writeObject(entry.getValue());
}

the good thing is that all references among objects (mostly metadata) in the single stream are going to be shared between body-less part and expression bodies part. We just don't have to read the second part, when we are not interested in it.

Reading the second part requires the ObjectInputStream to override readResolve and keep list of all pending RefExpression reference. When reading the integer ID and its associated real expression, just inject that expression to pending RefExpressions.

enso-bot[bot] commented 12 months ago

Jaroslav Tulach reports a new STANDUP for the last Saturday (2023-10-28):

Progress: - Integrated: https://github.com/enso-org/enso/pull/8160#discussion_r1374873510

Next Day: Speeding up startup

Google Docs: Sign-in
Access Google Docs with a personal Google account or Google Workspace account (for business use).
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-10-30):

Progress: - Integrated Node elimination: https://github.com/enso-org/enso/pull/8172

Next Day: Speeding up startup

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-10-31):

Progress: - Checking benchmarks & merging ApplicationSaturation removal: https://github.com/enso-org/enso/pull/8181

Next Day: Speeding up startup

JaroslavTulach commented 11 months ago

The new idea is to store the IR in the .ir files in a mode that can be accessed "lazily" and read on demand when needed.

Let's start with an analysis. By overriding the module cache reading code one can see that simple IO.println loads in 123 different class:

@@ -46,7 +50,14 @@ public final class ModuleCache extends Cache<ModuleCache.CachedModule, ModuleCac

     @Override
     protected CachedModule deserialize(EnsoContext context, byte[] data, Metadata meta, TruffleLogger logger) throws ClassNotFoundException, IOException, ClassNotFoundException {
-        try (var stream = new ObjectInputStream(new ByteArrayInputStream(data))) {
+        try (var stream = new ObjectInputStream(new ByteArrayInputStream(data)) {
+          @Override
+          protected ObjectStreamClass readClassDescriptor() throws IOException, ClassNotFoundException {
+            var clazz = super.readClassDescriptor();
+            System.err.println("CLASS: " + clazz.getName());
+            return clazz;
+          }
+        }) {
           if (stream.readObject() instanceof Module ir) {
               try {
                   return new CachedModule(ir,CompilationStage.valueOf(meta.compilationStage()), module.getSource());

here is the list:

java.util.UUID
org.enso.compiler.core.ir.CallArgument$Specified
org.enso.compiler.core.ir.DefinitionArgument$Specified
org.enso.compiler.core.ir.DiagnosticStorage
org.enso.compiler.core.ir.Expression$Binding
org.enso.compiler.core.ir.Expression$Block
org.enso.compiler.core.ir.expression.Application$Force
org.enso.compiler.core.ir.expression.Application$Prefix
org.enso.compiler.core.ir.expression.Application$Sequence
org.enso.compiler.core.ir.expression.Case$Branch
org.enso.compiler.core.ir.expression.Case$Expr
org.enso.compiler.core.ir.expression.Foreign$Definition
org.enso.compiler.core.ir.Function$Lambda
org.enso.compiler.core.ir.IdentifiedLocation
org.enso.compiler.core.ir.Literal$Number
org.enso.compiler.core.ir.Literal$Text
org.enso.compiler.core.ir.MetadataStorage
org.enso.compiler.core.ir.Module
org.enso.compiler.core.ir.module.scope.Definition$Data
org.enso.compiler.core.ir.module.scope.Definition$Type
org.enso.compiler.core.ir.module.scope.definition.Method$Conversion
org.enso.compiler.core.ir.module.scope.definition.Method$Explicit
org.enso.compiler.core.ir.module.scope.Export$Module
org.enso.compiler.core.ir.module.scope.Import$Module
org.enso.compiler.core.ir.module.scope.imports.Polyglot
org.enso.compiler.core.ir.module.scope.imports.Polyglot$Java
org.enso.compiler.core.ir.Name$Blank
org.enso.compiler.core.ir.Name$BuiltinAnnotation
org.enso.compiler.core.ir.Name$GenericAnnotation
org.enso.compiler.core.ir.Name$Literal
org.enso.compiler.core.ir.Name$MethodReference
org.enso.compiler.core.ir.Name$Qualified
org.enso.compiler.core.ir.Name$Self
org.enso.compiler.core.ir.Pattern$Constructor
org.enso.compiler.core.ir.Pattern$Literal
org.enso.compiler.core.ir.Pattern$Name
org.enso.compiler.core.ir.Pattern$Type
org.enso.compiler.core.ir.Type$Error
org.enso.compiler.core.ir.Type$Function
org.enso.compiler.core.ir.type.Set$Union
org.enso.compiler.data.BindingsMap
org.enso.compiler.data.BindingsMap$Cons
org.enso.compiler.data.BindingsMap$ExportedModule
org.enso.compiler.data.BindingsMap$ModuleMethod
org.enso.compiler.data.BindingsMap$ModuleReference$Abstract
org.enso.compiler.data.BindingsMap$PolyglotSymbol
org.enso.compiler.data.BindingsMap$Resolution
org.enso.compiler.data.BindingsMap$ResolvedConstructor
org.enso.compiler.data.BindingsMap$ResolvedImport
org.enso.compiler.data.BindingsMap$ResolvedMethod
org.enso.compiler.data.BindingsMap$ResolvedModule
org.enso.compiler.data.BindingsMap$ResolvedPolyglotField
org.enso.compiler.data.BindingsMap$ResolvedPolyglotSymbol
org.enso.compiler.data.BindingsMap$ResolvedType
org.enso.compiler.data.BindingsMap$SymbolRestriction$All$
org.enso.compiler.data.BindingsMap$SymbolRestriction$AllowedResolution
org.enso.compiler.data.BindingsMap$SymbolRestriction$Hiding
org.enso.compiler.data.BindingsMap$SymbolRestriction$Only
org.enso.compiler.data.BindingsMap$SymbolRestriction$Union
org.enso.compiler.data.BindingsMap$Type
org.enso.compiler.pass.analyse.AliasAnalysis$
org.enso.compiler.pass.analyse.AliasAnalysis$Graph
org.enso.compiler.pass.analyse.AliasAnalysis$Graph$Link
org.enso.compiler.pass.analyse.AliasAnalysis$Graph$Occurrence$Def
org.enso.compiler.pass.analyse.AliasAnalysis$Graph$Occurrence$Use
org.enso.compiler.pass.analyse.AliasAnalysis$Graph$Scope
org.enso.compiler.pass.analyse.AliasAnalysis$Info$Occurrence
org.enso.compiler.pass.analyse.AliasAnalysis$Info$Scope$Child
org.enso.compiler.pass.analyse.AliasAnalysis$Info$Scope$Root
org.enso.compiler.pass.analyse.BindingAnalysis$
org.enso.compiler.pass.analyse.CachePreferenceAnalysis$
org.enso.compiler.pass.analyse.CachePreferenceAnalysis$WeightInfo
org.enso.compiler.pass.analyse.DataflowAnalysis$
org.enso.compiler.pass.analyse.DataflowAnalysis$DependencyInfo
org.enso.compiler.pass.analyse.DataflowAnalysis$DependencyInfo$Type$Dynamic
org.enso.compiler.pass.analyse.DataflowAnalysis$DependencyInfo$Type$Static
org.enso.compiler.pass.analyse.DataflowAnalysis$DependencyMapping
org.enso.compiler.pass.analyse.GatherDiagnostics$
org.enso.compiler.pass.analyse.GatherDiagnostics$DiagnosticsMeta
org.enso.compiler.pass.analyse.TailCall$
org.enso.compiler.pass.analyse.TailCall$TailPosition$NotTail$
org.enso.compiler.pass.analyse.TailCall$TailPosition$Tail$
org.enso.compiler.pass.resolve.DocumentationComments$
org.enso.compiler.pass.resolve.DocumentationComments$Doc
org.enso.compiler.pass.resolve.ExpressionAnnotations$
org.enso.compiler.pass.resolve.GenericAnnotations$
org.enso.compiler.pass.resolve.GlobalNames$
org.enso.compiler.pass.resolve.IgnoredBindings$
org.enso.compiler.pass.resolve.IgnoredBindings$State$Ignored$
org.enso.compiler.pass.resolve.IgnoredBindings$State$NotIgnored$
org.enso.compiler.pass.resolve.MethodCalls$
org.enso.compiler.pass.resolve.MethodDefinitions$
org.enso.compiler.pass.resolve.ModuleAnnotations$
org.enso.compiler.pass.resolve.ModuleAnnotations$Annotations
org.enso.compiler.pass.resolve.Patterns$
org.enso.compiler.pass.resolve.TypeNames$
org.enso.compiler.pass.resolve.TypeSignatures$
org.enso.compiler.pass.resolve.TypeSignatures$Signature
org.enso.pkg.QualifiedName
org.enso.syntax.text.Location
scala.collection.generic.DefaultSerializationProxy
scala.collection.generic.SerializeEnd$
scala.collection.immutable.HashMap$
scala.collection.immutable.HashSet$
scala.collection.immutable.List$
scala.collection.immutable.Map$EmptyMap$
scala.collection.immutable.Map$Map1
scala.collection.immutable.Map$Map2
scala.collection.immutable.Map$Map3
scala.collection.immutable.Map$Map4
scala.collection.immutable.Set$EmptySet$
scala.collection.immutable.Set$Set1
scala.collection.immutable.Set$Set2
scala.collection.immutable.Set$Set3
scala.collection.immutable.Set$Set4
scala.collection.IterableFactory$ToFactory
scala.collection.MapFactory$ToFactory
scala.collection.mutable.HashMap$DeserializationFactory
scala.None$
scala.Option
scala.runtime.ModuleSerializationProxy
scala.Some
scala.Tuple2
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new 🔴 DELAY for yesterday (2023-11-01):

Summary: There is 10 days delay in implementation of the Speed Standard.Base initialization in simple Hello World example up! (#6100) task. It will cause 10 days delay for the delivery of this weekly plan.

There has been some progress, but more work is needed.

Delay Cause: IR caches and import resolution is a complicated piece of our code base nobody wants to touch. I am moving forward, but slowly.

Possible solutions: Yesterday I got new idea and started https://github.com/enso-org/enso/pull/8207 - I am thrilled now as I believe this direction has its merits.

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-01):

Progress: - runtime-compiler project created & merged: https://github.com/enso-org/enso/pull/8197

Next Day: "on demand" IR caches

Discord
Discord - A New Way to Chat with Friends & Communities
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-02):

Progress: - "on demand" serde: https://github.com/enso-org/enso/pull/8207/commits/a25bfb6ca9575e79128ccec4726248e9d5c32cba

Next Day: "on demand" IR caches

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for today (2023-11-03):

Progress: - Investigating GraalVM for JDK21 update status: https://github.com/enso-org/enso/pull/7991/files#r1381210135

Next Day: "on demand" IR caches

GitHub
Upgrade enso to GraalVM for jdk 21 by Akirathan · Pull Request #7991 · enso-org/enso
Fixes #7851 Pull Request Description Upgrade to GraalVM JDK 21. > java -version openjdk version "21" 2023-09-19 OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b...
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-08):

Progress: - review of inline evaluation: https://discord.com/channels/@me/955430343308095518/1170940214991130654

Next Day: "on demand" IR caches

Discord
Discord - A New Way to Chat with Friends & Communities
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-09):

Progress: - IR persist everything: https://github.com/enso-org/enso/pull/8207/commits/5413e2f60554e6f243a1d1cae12a5e487db5a392

Next Day: "on demand" IR caches

GitHub
Inline Execution by 4e6 · Pull Request #8148 · enso-org/enso
Pull Request Description close #8132 Important Notes Checklist Please ensure that the following checklist has been satisfied before submitting the PR: The documentation has been updated, if nec...
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-10):

Progress: - able to write, read & use caches: https://github.com/enso-org/enso/pull/8207#issuecomment-1805659682

Next Day: "on demand" IR caches

GitHub
Reporting diagnostics in non-strict mode · enso-org · Discussion #8271
When adding a test in #8245, I wanted to check that even in non-strict mode, the errors for duplicate from conversion are reported to the user. Unfortunately, I was unable to do so - because in gen...
GitHub
Handling of ambiguous conversions · enso-org · Discussion #8269
With #8245 I've fixed the wrong rendering of the ambiguous conversion error, but I have also demonstrated with a test that in non-strict mode, the first conversion just works and the user is not ro...
GitHub
New frontend AST--motivation and high-level design principles · enso-org · Discussion #8233
Source code edits are not currently easy in GUI2, and the implementation is currently broken. We could fix the current design, but not without increasing the complexity of the implementation. An al...
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new 🔴 DELAY for the last Friday (2023-11-10):

Summary: There is 7 days delay in implementation of the Speed Standard.Base initialization in simple Hello World example up! (#6100) task. It will cause 7 days delay for the delivery of this weekly plan.

Serialization & deserialization works, but we need to benefit from its possiblities.

Delay Cause: Two holidays, one conference day. Changes in restoreFromSerialization

Possible solutions: The idea https://github.com/enso-org/enso/pull/8207 - seems to be working, but needs few more days.

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-12):

Progress: - fixing broken CI build: https://github.com/enso-org/enso/pull/8281/commits/01e0d20d0f61778ac73fbd54300415b50f825aab

Next Day: "on demand" IR caches

GitHub
New frontend AST--motivation and high-level design principles · enso-org · Discussion #8233
Source code edits are not currently easy in GUI2, and the implementation is currently broken. We could fix the current design, but not without increasing the complexity of the implementation. An al...
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-13):

Progress: - updating Frgaal 21: https://github.com/enso-org/enso/pull/8286

Next Day: "on demand" IR caches

Discord
Discord - A New Way to Chat with Friends & Communities
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-14):

Progress: - working on https://github.com/enso-org/enso/pull/8207

Next Day: "on demand" IR caches

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-15):

Progress: - Found bug in PersistableProcessor: https://github.com/enso-org/enso/pull/8207/commits/5cb4b6848e6cfb78bab80bd4f4304fa74bbcff97

Next Day: "on demand" IR caches

enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-16):

Progress: - persistance project for faster startup: https://github.com/enso-org/enso/pull/8207/commits/79482a1ab8c28be07e21a0979bbc096118cafdb5

Next Day: "on demand" IR caches

GitHub
Upgrade enso to GraalVM for jdk 21 by Akirathan · Pull Request #7991 · enso-org/enso
Fixes #7851 Pull Request Description Upgrade to GraalVM JDK 21. > java -version openjdk version "21" 2023-09-19 OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b...
enso-bot[bot] commented 11 months ago

Jaroslav Tulach reports a new STANDUP for yesterday (2023-11-17):

Progress: - fixing license after Lookup library removal from runtime dependencies

Next Day: merge "on demand" IR caches

JaroslavTulach commented 11 months ago

There is 16% speed up of simple "hello world" application with the integration of: