Closed zaataylor closed 1 year ago
Looks like the Redis tests are failing due to a timeout issue đ¤ Not sure if the dependency upgrade is causing this, or if a re-run might yield different results đ
Looks like the Redis tests are failing due to a timeout issue đ¤ Not sure if the dependency upgrade is causing this, or if a re-run might yield different results đ
2023/03/01 00:19:03 |ERROR| error while polling for workflow task error=reading workflow instance: unmarshaling instance state: unexpected end of JSON input
e2e.go:156:
Error Trace: /home/runner/work/go-workflows/go-workflows/backend/redis/e2e.go:156
/home/runner/work/go-workflows/go-workflows/backend/redis/e2e.go:588
Error: Received unexpected error:
workflow did not finish in time: workflow did not finish in specified timeout
Test: Test_EndToEndRedisBackend/SideEffect_Simple
That unmarshaling instance state: unexpected end of JSON input
looks suspicious
It looks suspicious indeed! Thanks for catching this; I hadn't seen the entire error message while skimming through the failures in the logs. I plan to take a closer look at this soon and see if I can resolve it! đ
EDIT: Taking a closer look now, I see that all the Test_EndToEndRedisBackend
tests are failing when trying to read instance state when getting the next workflow task at: https://github.com/cschleiden/go-workflows/blob/d2b463129b7f70ebfb202d3efaf1e34abfa2fa5e/backend/redis/workflow.go#L81-L84
From what I see at the code which is called by the code above here: https://github.com/cschleiden/go-workflows/blob/d2b463129b7f70ebfb202d3efaf1e34abfa2fa5e/backend/redis/instance.go#L181-L197 it seems that somehow, the Redis pipeline GET
command is not returning an error, but the result of the command is invalid JSON, most likely the empty string ""
. I've been able to reproduce this exactly one time locally, and I'm trying to figure out how to reproduce it again so that I can get more insight into what's going on.
One thing I do know that changed between v9 and v8 of go-redis
is that Pipeline
is no longer thread safe, per what I see at: https://github.com/redis/go-redis/blob/master/CHANGELOG.md#breaking. But, I'm not sure if/how this might be contributing to what I'm seeing, as I think only one pipeline command is being executed by one thread at this time. I'm still not ⨠great ⨠at determining the best ways to examine Redis changes while the tests are running (or in general, Redis noob here), so I could be wrong about this.
To get a better idea of what/how many commands are being executed, I'm shortly going to push some temporary changes that add fmt.Printf
logging in the read instance code so I can see what's going on. If I'm able to figure out the issue after that, I'll open up a new PR with cleaner history and those changes removed. I can convert this one back to draft while I work on that if you'd like (đ / đ ).
Well, it looks as though for some reason, the context is being canceled in the tests while the pipeline command is being executed. Check out the partial log lines output below for one of the tests, Test_EndToEndRedisBackend/SimpleWorkflow_ExpectedHistory
. Notice the failure of the second get
command right after the first successful command targeting the instance:
2023-03-01T16:36:12.2433283Z 2023/03/01 16:36:12 |DEBUG| Created new workflow instance
2023-03-01T16:36:12.2434057Z 2023/03/01 16:36:12 |DEBUG| Created workflow instance instance_id=d39622e3-5342-4a90-b518-7c097210774d execution_id=207545da-f666-4fc9-92c7-87eb048f2235
2023-03-01T16:36:12.2435389Z Executed commands: [get instance:d39622e3-5342-4a90-b518-7c097210774d: {"instance":{"instance_id":"d39622e3-5342-4a90-b518-7c097210774d","execution_id":"207545da-f666-4fc9-92c7-87eb048f2235"},"metadata":{},"created_at":"2023-03-01T16:36:12.241931037Z"}]
2023-03-01T16:36:12.2435997Z Errors while executing commands: <nil>
2023-03-01T16:36:12.2436680Z Executed commands: [get instance:d39622e3-5342-4a90-b518-7c097210774d: ]
2023-03-01T16:36:12.2437031Z Errors while executing commands: context canceled
2023-03-01T16:36:12.2442377Z 2023/03/01 16:36:12 |ERROR| error while polling for workflow task error=reading workflow instance: unmarshaling instance state: unexpected end of JSON input
2023-03-01T16:36:12.2442985Z value of cmd.Result() creating the error was:
I think because the context is canceled while the pipeline command is executing, the return value of the command is effectively empty/non-existent? This could help explain why the json.Unmarshal
failure is happening.
I've included the full logs from this test below:
đ¤ are we that much slower so that we run into the timeout? Or what is canceling the context?
đ¤ are we that much slower so that we run into the timeout?
I think the timeout could be a side-effect of the original failure to read the instance, possibly? Or the fact that the workflow workflow runPoll
timeout (30 seconds) is longer than the time the test waits for the workflow to finish (10 seconds), so maybe if the poll fails the first time, it isn't able to re-poll before test timeout? I could try to reduce the worker timeout to a value lower than the test timeout and see if that makes a difference.
Or what is canceling the context?
I'm still stumped on this, unfortunately âšī¸. I thought those pipeline thread-safety changes might have something to do with it, but I'm not seeing a very clear connection yet :/
EDIT The only place I'm seeing a context being canceled that's relevant to the current scenario is in the poll()
function at: https://github.com/cschleiden/go-workflows/blob/d2b463129b7f70ebfb202d3efaf1e34abfa2fa5e/internal/worker/workflow.go#L229-L232
I could try to reduce the worker timeout to a value lower than the test timeout and see if that makes a difference.
This didn't make a difference, really not sure what the root issue is
Superseded by #179
This PR updates the
go-redis
dependency to v9!