Closed eweiter closed 7 years ago
It appears that on further inspection it seems that this issue isn't related to triggers or rules. When I invoked solely the action with wsk action invoke longAction
it waited the 2 minutes but then produced the same error as above.
So the steps to reproduce might simply be
wsk action invoke
that action you just createdIf your action is in fact the one above, then isn't this user error since your action is not actually returning a JSON object?
setTimeout(resolve, time))
should be setTimeout(function() { resolve({}) }, time)
no? (or does resolve() == resolve({})
?)
That said, I'm able to reproduce the timeout after 2 minutes.
FYI @csantanapr
@rabbah doing resolve()
it's valid result would be {}
its true resolve() == resolve({})
Ref: https://github.com/openwhisk/openwhisk/blob/master/core/nodejsActionBase/runner.js#L111
if (typeof resolvedResult === "undefined") {
resolvedResult = {};
}
resolve(resolvedResult);
I'm able to reproduce this problem, with similar code
If I set it to be done at 1.5 mins time = 90000;
then it works fine. But if I set it to 2 mins time = 120000;
then same problem as stated above.
Thanks - we must have a recent regression; since we don't have a test that runs for the full 5 minutes :( The action is in fact getting the right deadline, so something is wrong around the boundary of the HTTP call to the container. I don't recall any changes in that part of the code recently... will need to inspect more.
When I closed this issue https://github.com/openwhisk/openwhisk/issues/1246 I ran a 5 minute action... or so I thought. Should've added a test then.
I've confirmed the problem is either in the HttpUtils
class or the container runtime with a standalone app running natively and communication with the container. Adding a keep alive strategy doesn't help. Focusing on the container next.
That was my thinking, a regression. We have a unit test but not one that test this long up to two minutes or even 5 minutes. I double check and was able to recreate local deployment on mac
Shameless plug - found another use for this: https://medium.com/openwhisk/advanced-debugging-of-openwhisk-actions-518414636932#.vw69jm4db. 🔨
tag teaming with @csantanapr, we found that server.timeout = 0
is missing from the node proxy runtime. Testing a fix - will add a test for jenkins but not travis this time: we'll need three instances, one for node, one for python (covers swift and native/docker), and one for java :(
Here is the proposed patch:
diff --git a/core/nodejsActionBase/src/service.js b/core/nodejsActionBase/src/service.js
index a505569..f92da47 100644
--- a/core/nodejsActionBase/src/service.js
+++ b/core/nodejsActionBase/src/service.js
@@ -63,6 +63,7 @@ function NodeActionService(config, logger) {
var host = server.address().address;
var port = server.address().port;
});
+ server.timeout = 0; // disable automatic timeouts
}
I suggest repurposing this test: https://github.com/openwhisk/openwhisk/blob/master/tests/src/whisk/core/cli/test/WskBasicUsageTests.scala#L514 to run for the max allowed duration instead of 3 seconds. I verified Python actions are OK - can run for 5 minutes unchanged. Need to add a test regardless. That leaves Java (@psuter might you know).
Verified to work for this action:
> cat t.js
function main(params) {
var time = 300000 - 1000;
return new Promise((resolve) => setTimeout(resolve, time));
}
> wsk action create t.js t.js -t 300000
ok: created action t.js
> wsk action invoke t.js
ok: invoked /_/t.js with id 9809d2e03c564bd38e9edb608926e598
# wait 5 minutes
> wsk activation get 9809d2e03c564bd38e9edb608926e598 duration
ok: got activation 9809d2e03c564bd38e9edb608926e598, displaying field duration
299744
@loganherr FYI.
We need to also look at how to improve the logging in invoker for this case where there is an exception from /run
Here are the logs I got, but was not able to tell what happened that the proxy code timeout
👻 $ cat /Users/Shared/wsklogs/invoker0/invoker0_logs.log | grep tid_395
[2017-01-13T04:08:37.233Z] [INFO] [#tid_395] [Invoker] [marker:invoker_activation_start:121]
[2017-01-13T04:08:37.236Z] [INFO] [#tid_395] [Invoker] guest/long guest 39b2b10313614790ab24bccc60bebab8
[2017-01-13T04:08:37.241Z] [INFO] [#tid_395] [WhiskAction] [GET] serving from datastore: id: guest/long, rev: 3-10841ba57f740e4972f502258efb8cc0 [marker:database_cacheMiss_count:129]
[2017-01-13T04:08:37.242Z] [INFO] [#tid_395] [CouchDbRestStore] [GET] 'csantana23_carlosmacbookpro_whisks' finding document: 'id: guest/long, rev: 3-10841ba57f740e4972f502258efb8cc0' [marker:database_getDocument_start:130]
[2017-01-13T04:08:37.285Z] [INFO] [#tid_395] [CouchDbRestStore] [GET] 'csantana23_carlosmacbookpro_whisks' completed: found document 'id: guest/long, rev: 3-10841ba57f740e4972f502258efb8cc0' [marker:database_getDocument_finish:173:41]
[2017-01-13T04:08:37.292Z] [INFO] [#tid_395] [ContainerPool] Getting container for guest/long@0.0.3 of kind nodejs:6 with 23bc46b1-71f6-4ed5-8c54-816aa4f8c502: myPos = 4 completed = 3 slack = 4 activeCount = 0 toBeRemoved = 0 startingCounter = 0
[2017-01-13T04:08:37.296Z] [INFO] [#tid_395] [ContainerPool] Obtained a pre-warmed container
[2017-01-13T04:08:37.301Z] [INFO] [#tid_395] [ContainerPool] Obtained cold container d6423ab2982d913f287959aa26d4ca7cd30ee39749eb6350a58d7feca16fa77e - about to initialize
[2017-01-13T04:08:37.302Z] [INFO] [#tid_395] [WhiskContainer] sending initialization to container [wsk0_22_warmJsContainer_20170113T040036434Z] [d6423ab2982d913f287959aa26d4ca7cd30ee39749eb6350a58d7feca16fa77e] [172.17.0.9:8080]
[2017-01-13T04:08:37.591Z] [INFO] [#tid_395] [WhiskContainer] initialization result: RunResult(Interval(2017-01-13T04:08:37.303Z,2017-01-13T04:08:37.591Z),Some((200,{"OK":true})))
[2017-01-13T04:08:37.593Z] [INFO] [#tid_395] [Invoker] sending arguments to guest/long@0.0.3 container [wsk0_22_warmJsContainer_20170113T040036434Z] [d6423ab2982d913f287959aa26d4ca7cd30ee39749eb6350a58d7feca16fa77e] [172.17.0.9:8080] [marker:invoker_activationRun_start:481]
[2017-01-13T04:10:37.624Z] [INFO] [#tid_395] [Invoker] finished running activation id: 39b2b10313614790ab24bccc60bebab8 [marker:invoker_activationRun_finish:120512:120030]
[2017-01-13T04:10:37.639Z] [INFO] [#tid_395] [Invoker] log cursor has not advanced, trying 15 more times
[2017-01-13T04:10:37.642Z] [INFO] [#tid_395] [KafkaProducerConnector] sent message: completed[0][3]
[2017-01-13T04:10:37.649Z] [INFO] [#tid_395] [Invoker] posted completion of activation 39b2b10313614790ab24bccc60bebab8
[2017-01-13T04:10:37.743Z] [INFO] [#tid_395] [Invoker] recording the activation result to the data store
[2017-01-13T04:10:37.744Z] [INFO] [#tid_395] [ContainerPool] putBack returning container 19 delete = true completed = 4 slack = 3 maxActive = 4 activeCount = 1 startingCounter = 0
[2017-01-13T04:10:37.749Z] [INFO] [#tid_395] [WhiskActivation] write initiated on new cache entry
[2017-01-13T04:10:37.758Z] [INFO] [#tid_395] [CouchDbRestStore] [PUT] 'csantana23_carlosmacbookpro_whisks' saving document: 'id: guest/39b2b10313614790ab24bccc60bebab8, rev: null' [marker:database_saveDocument_start:120646]
[2017-01-13T04:10:37.785Z] [INFO] [#tid_395] [CouchDbRestStore] [PUT] 'csantana23_carlosmacbookpro_whisks' completed document: 'id: guest/39b2b10313614790ab24bccc60bebab8, rev: null', response: '{"ok":true,"id":"guest/39b2b10313614790ab24bccc60bebab8","rev":"1-06ec20b56cd4dfc1d18b516dd535d48a"}' [marker:database_saveDocument_finish:120673:23]
[2017-01-13T04:10:37.803Z] [INFO] [#tid_395] [WhiskActivation] write all done, caching id: guest/39b2b10313614790ab24bccc60bebab8, rev: null Cached
[2017-01-13T04:10:37.807Z] [INFO] [#tid_395] [Invoker] recorded activation
@jasonpet want to take this one?
Steps to reproduce the issue:
When trying to recreate a user issue the other day I stumbled upon what seems to be an error with the OW backend. When I created a NodeJS action that had a long execution duration (120000ms) and then created a trigger and rule for that action it produced an error once the action finished executing. If I changed the duration of the action to be shorter (2000ms) it worked. I created the action with a timeout of 300000ms (5min)
The error produced is
The action did not produce a valid response and exited unexpectedly.
I used the following as my sample code.
wsk action create longAction ~/<pathToFile>.js -t 300000
Here is the output of one of the failing invocations of my action