Open BlakeMScurr opened 6 years ago
Hey @BlakeMScurr, sorry I read over your questions at the end.
So perhaps helm delete deletes the cache but not the store, does that seem correct?
Correct which is by design.
How can I manually reset the event store?
kubectl -n fission get po -o name | grep nats | xargs kubectl -n fission delete
to reset workflows completely without reinstalling it, simply delete the workflows pod afterwards, as it will read the now empty store after restarting. I have this short script for it while developing:
#!/bin/bash
kubectl -n fission get po -o name | grep nats | xargs kubectl -n fission delete
kubectl -n fission-function get po -o name | grep workflow | xargs kubectl -n fission-function delete
Of course this is still a bug, as the workflow engine should not crash fail on past data/invocations. Looking at the trace, I think 0.3.0 fixes this issue.
If you get around testing it, could you verify whether this bug is still present?
@erwinvaneyk I am on the latest version of the fission (0.10) and fission workflows (0.5) and I got similar (probably the same) error when I increase the concurrency
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x8347f9]
goroutine 73 [running]:
github.com/fission/fission-workflows/pkg/types/aggregates.(*WorkflowInvocation).ApplyEvent(0xc4202feb00, 0xc4202cc000, 0x16, 0x7f3b40bd76c8)
/go/src/github.com/fission/fission-workflows/pkg/types/aggregates/invocation.go:92 +0x349
github.com/fission/fission-workflows/pkg/fes.(*SimpleProjector).project(0x1eff3a0, 0x16655c0, 0xc4202feb00, 0xc4202cc000, 0xc420638b58, 0x835a8a)
/go/src/github.com/fission/fission-workflows/pkg/fes/projectors.go:33 +0xe7
github.com/fission/fission-workflows/pkg/fes.(*SimpleProjector).Project(0x1eff3a0, 0x16655c0, 0xc4202feb00, 0xc42054bbe8, 0x1, 0x1, 0xc4202feae0, 0x0)
/go/src/github.com/fission/fission-workflows/pkg/fes/projectors.go:15 +0x6d
github.com/fission/fission-workflows/pkg/fes.Project(0x16655c0, 0xc4202feb00, 0xc420638be8, 0x1, 0x1, 0x0, 0x0)
/go/src/github.com/fission/fission-workflows/pkg/fes/projectors.go:8 +0x5f
github.com/fission/fission-workflows/pkg/fes.(*SubscribedCache).ApplyEvent(0xc42061e7c0, 0xc4202cc000, 0x1, 0x1)
/go/src/github.com/fission/fission-workflows/pkg/fes/caches.go:218 +0x40a
github.com/fission/fission-workflows/pkg/fes.NewSubscribedCache.func1(0x16651c0, 0xc4205b8000, 0xc42042c2c0, 0xc42061e7c0)
/go/src/github.com/fission/fission-workflows/pkg/fes/caches.go:175 +0x276
created by github.com/fission/fission-workflows/pkg/fes.NewSubscribedCache
/go/src/github.com/fission/fission-workflows/pkg/fes/caches.go:161 +0x16c
@thenamly thanks for your update on this issue. From your description, your issue sounds unrelated to this issue: invalid invocations preventing recovery of the engine. Can you share a bit more details on your setup?
This happens pretty rare. From what I understand it's happening before OOM because it never happened on bigger nodes.
Hi all :)
My workflow fission-function is in long running CrashLoopBackOff after I cancelled a workflow and reinstalled fission-workflows (which I did because I thought the workflow was hanging due to the fission environment being corrupted).
I noticed that the workflow function was erroring:
Reading through the stacktrace it seems that we're trying to apply a cancel event from the event store to a nil workflow invocation from the cache. So perhaps
helm delete
deletes the cache but not the store, does that seem correct?How can I manually reset the event store? I have a snapshot of a VM with fission workflows working, so this isn't a pressing issue for me, but I thought it would be worth making note of.
Kubernetes version 1.10.2 Fission version 0.6.0 Fission Workflows version 0.2.0