fission / keda-connectors

Generic connectors for Keda which can be used as worker images as part of scaleTargetRef.
Apache License 2.0
23 stars 30 forks source link

NATS Jetstream MQT goes in a loop if function returns code other than 200 #118

Open umeshgtank opened 1 year ago

umeshgtank commented 1 year ago

Fission/Kubernetes version Fission version 1.17 / Kubernetes version 1.24

$ fission --version
client:
  fission/core:
    BuildDate: "2022-09-16T13:24:57Z"
    GitCommit: b36e0516
    Version: v1.17.0
server:
  fission/core:
    BuildDate: "2022-09-16T13:24:57Z"
    GitCommit: b36e0516
    Version: v1.17.0

$ kubectl version
Client Version: v1.25.3
Kustomize Version: v4.5.7
Server Version: v1.24.8

Kubernetes platform (e.g. Google Kubernetes Engine) On-prem

Describe the bug

I am building a data pipeline and workflow looks something like Producer -> NATS Jetstream -> MQT -> Consumer. I am following fission documentation available here - https://fission.io/docs/usage/triggers/message-queue-trigger-kind-keda/nats-jetstream/#producer-function. While testing the workflow, if I return an error (it is 400) from the consumer function I can see MQT keeps calling consumer function in a loop with the same message and it never stops. To reproduce the issue all you need to do is just return 400 from the handler function of hello.go file. I thought of investigating this further and I came across a keda-connectors code for NATS Jetstream which is available here(https://github.com/fission/keda-connectors/blob/main/nats-jetstream-http-connector/main.go). As we can see in the code, the handleHTTPRequest function ack messages received from Jetstream only if http request is successful. In the case of failure it doesn't send out ack to Jetstream. According to Jetstream documentation (see here https://docs.nats.io/nats-concepts/jetstream/consumers) if ack is not received by the server within the AckWait time, Jetstream will redeliver the message. Since new delivered message is also result in the error (since request is bad) this will go in a loop.

To Reproduce To reproduce the issue all you need to do is just return 400 from the handler function of hello.go file. The sample is available here - https://fission.io/docs/usage/triggers/message-queue-trigger-kind-keda/nats-jetstream/#producer-function.

Expected result

MQT shouldn't go into the never ending loop

Actual result We can see MQT keeps calling the consumer function again and again with the same message

Screenshots/Dump file

$ fission support dump

Additional context

May be potential fix would be to just ack the message regardless of the success or failure. And failure scenarios are handled by fission in two different ways. Once is retry and if it fails even after retry messages will be pushed to error queue. So I believe it would be safe to just ack as soon as message is received from the Jetstream. The bigger problem is - in case if authentication fails the function will never get a chance to execute since router will return auth failure error. In such a scenario loop is unavoidable.

sjk7524068 commented 1 year ago

Big issue indeed. Same bug with Keda-Rabbitmq, Unacked message keep consuming in a loop without any act move.