Open b1conrad opened 3 years ago
It is unclear what the purpose of the updateManifoldAppList
rule is. It loops over all of the things, and sends each one an event that will be selected by every application ruleset installed in it. Then the results of all this is cached in an entity variable which doesn't seem to be used.
What raises the manifold:update_app_list
? Cron?
The event is raised by the Manifold SPA here, called by React as soon as this component has been rendered into the DOM tree.
Ah, wait, I see the point of your question. The manifold:update_app_list
should exhibit the same symptom, since is selects the rule which fails. And it appears to do so. When the list of applications is brought up when setting up notifications for an app in my thing, everything seems to work fine. Expect that I begin getting the error pop-up, and the console log shows the same kind of failure.
So, the problem is the rule fails during evaluation. Avoiding selecting that rule with manifold:update_app_list
allows Manifold to present the thing cards. Getting the cards (picos) for this owner's things happens here so we would always select the problematic rule when getting those picos.
We need to understand why this rule fails in line 73, and fix that.
Another question is why did the failure begin to appear sometime after IIW (April 20-22) and before the certificate expired (May 16)?
I don't see evidence of code changes anywhere. The latest commit to the Manifold repo was March 5, when I changed the Google profile keys. Looking at the rulesets of the instance... There are the changes Beto and I made to the Manifold app io.picolabs.manifold_cloud_agent
on April 21, but these helped the app, and shouldn't have impacted notifications. Another innocuous change was made to io.picolabs.notifications
on May 13.
Shouldn't line 73 be using wrangler:pico_query()
so that it doesn't go over HTTP?
That's was my first reaction, but it is an event and not a query, that is being sent to one of the things. Also, the v0 pico engine doesn't have a primitive for calling a function in another ruleset without using HTTP.
Manifold sends an event and then a query in quick succession, to get the things. The logging for the event is shown in the first issue comment, and has a two minute gap. The logging for the query fits into that two minute gap:
2021-05-19T15:16:43.727Z episode_start query/YGqfdevaDDtgU85acjzzKM/io.picolabs.manifold_pico/getManifoldInfo
2021-05-19T15:16:43.727Z query received: io.picolabs.manifold_pico/getManifoldInfo
2021-05-19T15:16:43.728Z query added to pico queue: cjkvb40p701sghtlsrrput24q
2021-05-19T15:18:07.741Z {"things":{...},"communities":{}}
2021-05-19T15:18:07.741Z episode_stop
The query correctly puts together the information and returns it to the browser, but meanwhile nginx
has timed out and the browser thinks there is a CORS error because of a missing header (which, of course, nginx
has not provided).
OK, so that begs the question why is it using http:get()
instead of event:send()
?
Because it is gathering all the directives that come in from all the rules (in all the rulesets installed in a thing pico) which select on the manifold:apps
event. Perhaps this would be a good place to use the scatter-gather pattern instead. That is, instead of returning the results in directives, each rule would use event:send
to supply the information.
This is why event:send()
in the Classic engine had a synchronous mode. You could add the attribute _async
with value 0 to the attributes and then gather results (from a system:send_complete
event)
As we discovered, having people using HTTP to query picos on the same engine often results in deadlock situations. That seems to be what's happening here, no?
But, as you say, using scatter-gather is probably the right solution since someone with lots of things is going to have a long wait for synchronous interactions to finish.
Recently, Manifold stopped working. It would log the owner in, but show none of her thing cards. Errors were thrown in the console log of the browser, but no other indication was given. The screen was indistinguishable from that of an owner who had not yet created any things.
It appears that an event (
manifold:update_app_list
) sent from Manifold was the culprit. The event stopped the pico engine for two minutes, as shown by this extract from the logs.The
updateManifoldAppList
rule is selected and after its firstklog
the pico-engine is busy for two minutes until a timeout wakes it up again. A couple of queries come in, but they are not processed until after the timeout (and that was too late, because thenginx
gateway had already timed out by then).A temporary patch was made by changing the
io.picolabs.notifications
ruleset to not select on that event. See the #manifold channel in our Pico Labs slack organization for details.This situation needs to be investigated, the root cause identified, and the problem fixed properly.