GoogleCloudPlatform / functions-framework-nodejs

FaaS (Function as a service) framework for writing portable Node.js functions
Apache License 2.0
1.29k stars 158 forks source link

Provide a mechanism to retry event processing without logging an error #488

Open flovouin opened 1 year ago

flovouin commented 1 year ago

Although Cloud Functions do allow a retry mechanism when processing events, this mechanism is currently not very flexible. Indeed, in order for the processing to be retried, the function must throw an error.

Because the functions framework does not know the nature of the thrown error (Is it an uncaught error in a dependency? Is it an error thrown on purpose?), it will always log it as such along with its stack trace. While this makes sense in the general case, it also has effects on other Google services:

There are many valid reasons why one would want to retry processing without explicitly logging an error. Actually, I would argue that the main use case detailed in the documentation (transient errors) should probably not log an error. If an error is transient (e.g. a timeout or a network error when calling an API) and is expected to occur from time to time, there's not much value in counting it as a generic error in Error Reporting and Cloud Logging / Monitoring. In fact, those transient errors can probably be reported more precisely using Cloud Logging and Monitoring directly (not relying on the functions-framework wrapper), along with other statistics about the API calls (e.g. latency, endpoint name, etc).

Conversely, I'd argue that we want an error to be logged and picked us as such by downstream GCP services when it is unexpected. Moreover if the error is unexpected, it should probably not be retried because it could be a bug in the code (as the GCP documentation indeed calls out).

✨ Ideal behaviour

Summing up the previous points, the ideal behaviour for me would be to:

This would reduce the risk of setting up a Cloud Function with the retry flag and ending up with endless retries. The solutions provided by the documentation involve custom logic and are in my opinion prone to errors and/or suboptimal.

🛠️ Current workaround

My current workaround is actually a wrapper around all my Cloud Functions which:

Obvisouly, this workaround does not fully eliminate the problem because retryable errors are still logged as errors by the functions-framework. However it does help to:

💡 Breaking change-free suggestion

The ideal behaviour described above would result in a huge breaking change and the idea would probably be quickly dismissed. However I think there is a smaller, non-breaking, change that can bring more flexibility to developers: simply provide an explicit value or error that can be returned/thrown by the function, which does not involve logging an error but still causes the functions-framework to return a 500 status code. This would be a small change to sendResponse, checking that result or err is of a specific type (or whatever check you feel most comfortable with).

This solution does solve the main problem which is that processing cannot currently be retried without logging an error. Because uncaught errors would still be retried (and also logged as errors), this does not alleviate the risk of an event being endlessly retried in case of a bug in the function's code. However this is clearly documented and can be solved in user code (e.g. like it is in my current workaround).

josephlewis42 commented 11 months ago

I was able to look into this today, it seems like CloudEvent handlers also support a (little documented) callback which can accept an (error, response) callback; but unfortunately the response body can't be used in an express friendly way to set the code without sending a body since FF also checks for integers (which would be interpreted as the HTTP response code) and casts them to strings first: https://github.com/GoogleCloudPlatform/functions-framework-nodejs/blob/978054b786ce180e350b8484b65e0ea116abba1f/src/invoker.ts#L54

Agree that a custom, sentinel error is probably cleanest here that'll allow extension without breaking backward compatibility.