enso-org / enso

Enso Analytics is a self-service data prep and analysis platform designed for data teams.
https://ensoanalytics.com
Apache License 2.0
7.39k stars 323 forks source link

Cannot create a project in cloud #9789

Closed farmaazon closed 4 months ago

farmaazon commented 7 months ago

Discord username

farmaazon

What type of issue is this?

Permanent – Occurring repeatably

Is this issue blocking you from using Enso?

Is this a regression?

What issue are you facing?

When I create a new project in the cloud, it loads but does not show any nodes. When I stop the project and re-run, it does not open even after long waiting (10 minutes).

Expected behaviour

I see the starting node after creating new project; running it should open it in a reasonable time.

How we can reproduce it?

No response

Screenshots or screencasts

No response

Logs

No response

Enso Version

a786ad2d0adb2b3eab1ac405a2937f85c219a40f

Browser or standalone distribution

Standalone distribution (cloud project)

Browser Version or standalone distribution

standaolne

Operating System

Linux

Operating System Version

Garuda

Hardware you are using

No response

PabloBuchu commented 7 months ago

So there is bunch of errors in messages and logs from language server. @hubertp could you take a look? @somebody1234 have anything changed in communication with the server?

Screenshot 2024-04-29 at 13 54 05 Screenshot 2024-04-29 at 13 56 23 Screenshot 2024-04-29 at 13 55 18
hubertp commented 7 months ago

Received InboundMessage with unknown payload type [1] looks like a culprit.

somebody1234 commented 7 months ago

seems like this is the reason why:

farmaazon commented 7 months ago

seems like this is the reason why:

I'm doubtful.

The flatbuffer's format does not change between different versions of flatc - so the format emitted by code generated with never version of flatc should be happily read by code generated by older version, if the schema files were the same - and those haven't been touched from a long time.

And the fact, that the local projects are working properly seems to confirm it. Any issue with binary connection should be visible when running local projects too. But this does not happen.

somebody1234 commented 7 months ago

ah, weird... in that case i'm not sure why payload type 1 is an error then: https://github.com/enso-org/enso/blob/660c5e7a9db732c3ebc7b32cf151f283babdf49c/app/gui2/shared/binaryProtocol.ts#L708

since it seems to be the session initialization payload...

somebody1234 commented 7 months ago

so it seems line engine does not handle duplicate INIT_SESSION messages.

https://github.com/enso-org/enso/blob/660c5e7a9db732c3ebc7b32cf151f283babdf49c/engine/language-server/src/main/scala/org/enso/languageserver/protocol/binary/BinaryConnectionController.scala#L192-L207

i assume then, that this is due to the same 15 second timeout issue from before?

somebody1234 commented 7 months ago

this is also suspicious: image

maybe the dashboard is starting GUI2 twice...?

hubertp commented 7 months ago

In my case I'm getting a blank page as well but no errors on backend. Maybe one of the console errors explains? Screenshot from 2024-04-29 15-40-18 Screenshot from 2024-04-29 15-38-31

somebody1234 commented 7 months ago

i assume cloud.enso.org still doesn't work as it's still using GUI1?

hubertp commented 7 months ago

i assume cloud.enso.org still doesn't work as it's still using GUI1?

Whaaa? Well that would explain things. But I talked to @PabloBuchu yesterday and he said it is running develop.

somebody1234 commented 7 months ago

hmm. i guess it's running develop, but the logic to fetch GUI from S3 has not been changed - and i guess S3 has not been updated with GUI2

it's worth noting that cloud.enso.org does not work anyway (afaict), because we do not yet have the Y.js server (for syncing GUI state and text file state) running on the cloud.

hubertp commented 7 months ago

I'm seeing (with trace level logs) something suspicious:

May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]: [TRACE] [2024-05-02T15:40:30Z] [org.enso.librarymanager.local.DefaultLocalLibraryProvider] Local library Standard.Base not found at [***/libraries].
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]: [TRACE] [2024-05-02T15:40:30Z] [org.enso.librarymanager.local.DefaultLocalLibraryProvider] Failed to load the candidate library package description at [***/data_root].
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]: org.enso.pkg.PackageManager$PackageNotFound: The package file does not exist.
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.pkg.PackageManager.loadPackage(Package.scala:348)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.local.DefaultLocalLibraryProvider.$anonfun$findCandidates$2(DefaultLocalLibraryProvider.scala:79)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at java.base/java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.local.DefaultLocalLibraryProvider.findCandidates(DefaultLocalLibraryProvider.scala:77)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.local.DefaultLocalLibraryProvider.findLibraryHelper(DefaultLocalLibraryProvider.scala:39)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.local.DefaultLocalLibraryProvider.findLibrary(DefaultLocalLibraryProvider.scala:24)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.LibraryResolver.resolveLibraryVersion(LibraryResolver.scala:36)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.librarymanager.DefaultLibraryProvider.findLibrary(DefaultLibraryProvider.scala:52)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.interpreter.runtime.DefaultPackageRepository.ensurePackageIsLoaded(DefaultPackageRepository.scala:411)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolverForIR.loadLibraryModule(ImportResolverForIR.java:124)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolverForIR.loadLibraryModule(ImportResolverForIR.java:23)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolverAlgorithm.tryResolveImportNew(ImportResolverAlgorithm.java:141)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolverAlgorithm.tryResolveImport(ImportResolverAlgorithm.java:72)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolver.$anonfun$mapImports$3(ImportResolver.scala:74)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/scala.collection.immutable.List.map(List.scala:246)
May 02 15:40:30 ip-172-31-0-127.eu-west-1.compute.internal enso_runtime[2573]:         at org.enso.runtime/org.enso.compiler.phase.ImportResolver.analyzeModule$1(ImportResolver.scala:72)

As if all libraries were gone

hubertp commented 7 months ago

Actually that doesn't seem to be the cause of the problem. I will need to investigate why it happens in the first place but it seems to recover from this problem.

hubertp commented 7 months ago

Not sure how relevant that is but why is GUI reporting a wrong version in cloud? I thought it's using latest nightly? Screenshot from 2024-05-03 13-01-54

jdunkerley commented 7 months ago

hmm. i guess it's running develop, but the logic to fetch GUI from S3 has not been changed - and i guess S3 has not been updated with GUI2

it's worth noting that cloud.enso.org does not work anyway (afaict), because we do not yet have the Y.js server (for syncing GUI state and text file state) running on the cloud.

We have had the GUI2 running with a cloud backend in electron. Yes we have to run the y.js server locally but it was working.

somebody1234 commented 7 months ago

yup - by "cloud.enso.org" i mean specifically the website hosted at cloud.enso.org.

when we run through electron (and on the browser of the electron server!) i believe we do use the local version of the GUI, as the check for whether we need to download the GUI, is determined by whether the Local Backend (the Project Manager backend) is supported

hubertp commented 7 months ago

So I have a fix for the timeout that is being reported when using electron app and connection to the cloud. The issue is basically... weak hardware. On my machine first compilation takes max 3-4 seconds on a bad day. In the cloud setup it takes at least 25 seconds alone. Given that we have timeouts for serving API requests set to 5 seconds, the problem becomes rather common.

When running via browser, I believe it is not running the latest GUI? At least the type of requests seem a bit outdated and different from what electron (nightly) is doing.

enso-bot[bot] commented 7 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-03):

Progress: Paused work on #9736 to investigate cloud issues. Looks like slow compilations are delaying handling of requests leading to timeouts. Added workaround in multiple handlers so that we don't return timeout immediately. It should be finished by 2024-05-06.

Next Day: Next day I will be working on the #9789 task. Continue the investigation. Figure out if we can profile the startup to find the root cause.

hubertp commented 7 months ago

Some progress. Now failing with distilled to https://github.com/enso-org/enso/issues/9876

May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]: [ERROR] [2024-05-07T10:17:23Z] [org.enso.languageserver.runtime.RuntimeConnector$Endpoint] Failed to deserialize runtime API envelope
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]: com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Could not resolve type id 'org.enso.polyglot.data.Tree$Node' as a subtype of `org.enso.polyglot.data.Tree$Node<org.enso.polyglot.runtime.Runtime$Api$SuggestionUpdate>`: Not a subtypetime[11629]:  at [Source: (byte[])[1561 bytes]; byte offset: #398] (through reference chain: org.enso.polyglot.runtime.Runtime$Api$Response["payload"]->org.enso.polyglot.runtime.Runtime$Api$SuggestionsDatabaseModuleUpdateNotification["updates"]->org.enso.polyglot.data.Tree$Root["children"]->com.fasterxml.jackson.module.scala.deser.GenericFactoryDeserializerResolver$BuilderWrapper[0])ception.java:43)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.DeserializationContext.invalidTypeIdException(DeserializationContext.java:2084)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.DatabindContext._throwNotASubtype(DatabindContext.java:290)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.DatabindContext.resolveAndValidateSubType(DatabindContext.java:251)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.ClassNameIdResolver._typeFromId(ClassNameIdResolver.java:72)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.ClassNameIdResolver.typeFromId(ClassNameIdResolver.java:66)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:159)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:151)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136) 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithType(BeanDeserializerBase.java:1296)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:361)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.module.scala.deser.GenericFactoryDeserializerResolver$Deserializer.deserialize(GenericFactoryDeserializerResolver.scala:125)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:545)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:570)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1409)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:220)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:187)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:170)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:570)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1409)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:220)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:187)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:170)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:542)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:570)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1409)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:220)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:187)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:170)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:136)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:74)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3833)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//org.enso.polyglot.runtime.Runtime$Api$.$anonfun$deserializeApiEnvelope$1(Runtime.scala:1979)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//scala.util.Try$.apply(Try.scala:210)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//org.enso.polyglot.runtime.Runtime$Api$.deserializeApiEnvelope(Runtime.scala:1979)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.IsolatedClassLoader//org.enso.languageserver.runtime.RuntimeConnector$Endpoint.sendBinary(RuntimeConnector.scala:210)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.api.instrumentation.TruffleInstrument$Env$MessageTransportProxy$MessageEndpointProxy.sendBinary(TruffleInstrument.java:1222)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.Endpoint.sendToClient(Endpoint.scala:39)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.AnalyzeModuleJob$.sendModuleUpdate(AnalyzeModuleJob.scala:143)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.AnalyzeModuleJob$.doAnalyzeModule(AnalyzeModuleJob.scala:118)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.AnalyzeModuleJob$.analyzeModule(AnalyzeModuleJob.scala:58)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.$anonfun$ensureCompiledModule$2(EnsureCompiledJob.scala:134)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/scala.util.Either.map(Either.scala:382)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.$anonfun$ensureCompiledModule$1(EnsureCompiledJob.scala:123)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/scala.Option.map(Option.scala:242)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.ensureCompiledModule(EnsureCompiledJob.scala:121)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.$anonfun$ensureCompiledFiles$2(EnsureCompiledJob.scala:91)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/scala.collection.StrictOptimizedIterableOps.map(StrictOptimizedIterableOps.scala:100)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/scala.collection.StrictOptimizedIterableOps.map$(StrictOptimizedIterableOps.scala:87)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/scala.collection.mutable.ArrayDeque.map(ArrayDeque.scala:39)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.ensureCompiledFiles(EnsureCompiledJob.scala:91)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.run(EnsureCompiledJob.scala:67)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.job.EnsureCompiledJob.run(EnsureCompiledJob.scala:50)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.enso.runtime/org.enso.interpreter.instrument.execution.JobExecutionEngine.$anonfun$runInternal$1(JobExecutionEngine.scala:135)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at java.base/java.lang.Thread.run(Thread.java:1583)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotThread.access$001(PolyglotThread.java:53)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotThread$1.execute(PolyglotThread.java:106)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotThread$ThreadSpawnRootNode.executeImpl(PolyglotThread.java:140)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotThread$ThreadSpawnRootNode.execute(PolyglotThread.java:131)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:745)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:669)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:602)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:586)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:519)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:500)
May 07 10:17:23 ip-172-31-10-107.eu-west-1.compute.internal enso_runtime[11629]:         at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotThread.run(PolyglotThread.java:102)
hubertp commented 7 months ago

The stacktrace might be unrelated. I assumed it's the cause of Screenshot from 2024-05-07 11-12-55 but not sure anymore.

EDIT: Yeah, I think it's the wss setup that is broken.

enso-bot[bot] commented 7 months ago

Hubert Plociniczak reports a new STANDUP for yesterday (2024-05-06):

Progress: Continued investigating cloud issues. Added PR for collecting profiling data from docker containers. It should be finished by 2024-05-06.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-05-07):

Summary: There is 3 days delay in implementation of the Cannot create a project in cloud (#9789) task. It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Underestimated cloud issue

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for yesterday (2024-05-07):

Progress: Trying to bisect different gui builds to locate the culrpit change that is closing websockets. It should be finished by 2024-05-09.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

hubertp commented 6 months ago

I've been testing this quite extensively and it is a bit random when it comes to the stability. Also, the blank page... isn't often so blank. The node appears either after some time or it is out of the usual scene and you have zoom out/in to see it. Adding nodes typically works, once it loads. The problem of slow processing is still there. I'm trying to get hold of the profiling data.

For testing/bisecting I've been building IDE with cloud support and things started to go visibly worse around mid April but I can't seem to pinpoint it to the exact commit.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-08):

Progress: Continued an effort to pinpoint the change that broke cloud support. Looks like it is mostly cloud-setup dependent as sometimes gui build works and sometimes not. Waiting on profiling data and more logs info to confirm. In the meantime continued my work on #9736, trying to get a draft PR out. It should be finished by 2024-05-09.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-09):

Progress: Draft PR for #9736, waiting on the option to gather cloud profile data. It should be finished by 2024-05-09.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new 🔴 DELAY for the provided date (2024-05-10):

Summary: There is 6 days delay in implementation of the Cannot create a project in cloud (#9789) task. It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Still haven't figured out why cloud setup affects performance. Unable to reproduce problems locally.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-10):

Progress: Analyzing profiling data, adding more logs, reducing potential bottlenecks to unclog cloud setup (#9927, #9915) It should be finished by 2024-05-15.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-13):

Progress: Addressing review on PRs, still trying to investigate bottlenecks based on the logs and profiling data. It should be finished by 2024-05-15.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

hubertp commented 6 months ago

Regarding websocket being randomly closed on startup I was able to reproduce the problem by adding a simple Thread.sleep in https://github.com/enso-org/enso/blob/develop/engine/language-server/src/main/scala/org/enso/languageserver/http/server/BinaryWebSocketServer.scala#L71. Except in my case in the logs I was able to see that the connection was being closed. Screenshot from 2024-05-17 15-59-02

The conclusion is that I believe the server is simply overloaded on startup and slow to respond to a new websocket connection request, leading to some kind of timeout on the GUI side. On a subsequent try from GUI it typically connects without any problems so the problem can be more or less ignored.

The performance in the cloud, given the current resources, is still unsatisfactory and can lead to such random failures. We don't want to "recover" from such issues, we simply don't want to have them to appear in the first place.

GitHub
enso/engine/language-server/src/main/scala/org/enso/languageserver/http/server/BinaryWebSocketServer.scala at develop · enso-org/enso
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.
hubertp commented 6 months ago

I finally managed to get hold of profiling data from a low spec machine that more closely resembles users' and current cloud spec. [Uploading profiling.npss.txt…]() It looks like a considerable amount of time is spent in jackson serde.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new 🔴 DELAY for today (2024-05-20):

Summary: There is 7 days delay in implementation of the Cannot create a project in cloud (#9789) task. It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Regressions when testing on low spec machine. Temporarily moved on to work on other issues.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for today (2024-05-20):

Progress: Finding culprit change for #9993. Continued work on replacing jackson to potentially speedup startup, based on the collected profiling data. Devising plan on further improvements to startup. It should be finished by 2024-05-22.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-16):

Progress: Not much progress with gathering profiling data from cloud. Switched to a local low-spec machine. It should be finished by 2024-05-22.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-05-17):

Progress: Added profiling option to AppImage so that profiling data can be collected from a local setup. Found a recent regression on startup and narrowed it down to a range of commits. Analyzing collected data and trying to come up with some perf improvements. It should be finished by 2024-05-22.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 6 months ago

Hubert Plociniczak reports a new STANDUP for yesterday (2024-05-21):

Progress: Still tinkering with jackson replacement. Dealing with non-deterministic bug in deserialization in jsoniter. Will file a bug upstream. It should be finished by 2024-05-22.

Next Day: Next day I will be working on the #9789 task. Continue the investigation.

enso-bot[bot] commented 5 months ago

Hubert Plociniczak reports a new 🔴 DELAY for the provided date (2024-06-06):

Summary: There is 19 days delay in implementation of the Cannot create a project in cloud (#9789) task. It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Draft PR was parked until refactoring of ModuleScope was finished.

enso-bot[bot] commented 5 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-06-06):

Progress: Bringing old PR up-to-date. Trying workarounds for some encoding issues with Option values. Initial performance results are very promising. It should be finished by 2024-06-10.

Next Day: Next day I will be working on the #9789 task. Continue integrating new JSON serde

enso-bot[bot] commented 5 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-06-07):

Progress: Testing the final version of PR. Confirmed performance results. Removing jackson whenever possible. Added a workaround for Option values that caused issues when deserializing them. It should be finished by 2024-06-10.

Next Day: Next day I will be working on the #9789 task. Deal with licensing, finish removing jackson

enso-bot[bot] commented 5 months ago

Hubert Plociniczak reports a new STANDUP for the provided date (2024-06-10):

Progress: Finished replacement of jackson to jsoniter. Started looking into performance issues in the cloud #10231. Profiling startup. It should be finished by 2024-06-10.

Next Day: Next day I will be working on the #10231 task. Continue investigating startup issues

hubertp commented 4 months ago

This is currently blocked on Ydoc integration in the cloud.

hubertp commented 4 months ago

Partially related: https://github.com/enso-org/enso/issues/9993#issuecomment-2217467717 Blocked by #10495