Open aagrawal207 opened 3 years ago
Hi @as2d3 Thanks for opening the feature request. I am wondering what would the UX be like? If I understand correctly, Intention here is to predict which of the exceptions thrown while processing an event is temporary or permanent?
Since API don't control what is the logic inside implementation of SqsMessageHandler
, it's difficult to predict for the library which ones are permanent/temporary.
Could you please elaborate a bit on this? With some examples?
Hi @pankajagrawal16, currently SQS Batch processing throws only SQSBatchProcessingException
. Basically, I am two new exceptions instead of this old one, something like SQSBatchProcessingTemporaryException
and SQSBatchProcessingPermanentException
.
So, currently I have implemented my own exceptions like these:
public class NotificationValidationException extends SQSBatchProcessingException {
public NotificationValidationException(final String message, final Throwable cause) {
super(message, cause);
}
}
and
public class MyServiceRetryableException extends SQSBatchProcessingException {
public NotificationValidationException(final String message, final Throwable cause) {
super(message, cause);
}
}
My SQS processor throws NotificationValidationException
and MyServiceRetryableException
instead of SQSBatchProcessingException
. My batch processing Lambda Handler catches SQSBatchProcessingException
like this:
public String handleRequest(final SQSEvent sqsEvent, final Context context) {
try {
LOGGER.debug("Processing SQS message batch with SQSEvent: {}", sqsEvent);
awsLambdaPowertoolsSqsUtilsWrapper.batchProcessor(sqsEvent, notificationProcessor);
} catch (SQSBatchProcessingException sqsBatchProcessingException) {
final List<SQSEvent.SQSMessage> failures = sqsBatchProcessingException.getFailures();
LOGGER.error("(Partial) batch of SQS messages failed: {}", failures);
throw sqsBatchProcessingException;
}
return "{\"statusCode\": 200}";
}
The issue is that after the handler processes the batch, all the failed messages move back to main queue. Instead the logic could be like this:
public String handleRequest(final SQSEvent sqsEvent, final Context context) {
try {
LOGGER.debug("Processing SQS message batch with SQSEvent: {}", sqsEvent);
awsLambdaPowertoolsSqsUtilsWrapper.batchProcessor(sqsEvent, notificationProcessor);
} catch (SQSBatchProcessingTemporaryException sqsBatchProcessingTemporaryException) {
final List<SQSEvent.SQSMessage> failures = sqsBatchProcessingTemporaryException.getFailures();
LOGGER.error("(Partial) batch of SQS messages failed, will be moved back to main queue: {}", failures);
throw sqsBatchProcessingTemporaryException;
} catch (SQSBatchProcessingPermanentException sqsBatchProcessingPermanentException) {
final List<SQSEvent.SQSMessage> failures = sqsBatchProcessingPermanentException.getFailures();
LOGGER.error("(Partial) batch of SQS messages failed, will be moved to DLQ directly: {}", failures);
throw sqsBatchProcessingPermanentException;
}
return "{\"statusCode\": 200}";
}
Here, the ask is basically, if the message schema validation type exception is thrown, then there is no reason to move it back to the main queue and process it again later. We would like to directly move these to DLQ or delete it.
Thanks for the response @as2d3. I understand what is the intent of the feature request now.
That being said, it will be too much assumption on behalf of library to classify which of the errors/exceptions should be retried and which one should not be. That's why we provide generic implementation with SQSBatchProcessingException
wrapping all the details of exception which is thrown by user provided implemenation of SqsMessageHandler
.
So in your use case, the SQSBatchProcessingException
still has all the underlying exception for you to decide what should be done with the SqsEvent next. It might be, you will like to move some to DLQ or want them to be retried.
I guess one way to support this use case better will be to accept from user which exception types are classified as "Move to DLQ exception" and then the utility can take care of moving those messages to DLQ instead of letting them being retried.
I just wanted to present an example, design could be something else. What I really want from the library is to provide an easy way to move the failed message directly to DLQ instead of moving it back to the main queue. I think this is a very useful use-case and (imho) is a must feature of an SQS consumer library.
Library does not need to define temporary and permanent exceptions I would say, but there should be a simple utility to move the message to DLQ or delete it completely instead which can be used in exception handling at implementation level.
I just wanted to present an example, design could be something else. What I really want from the library is to provide an easy way to move the failed message directly to DLQ instead of moving it back to the main queue. I think this is a very useful use-case and (imho) is a must feature of an SQS consumer library.
Library does not need to define temporary and permanent exceptions I would say, but there should be a simple utility to move the message to DLQ or delete it completely instead which can be used in exception handling at implementation level.
Rite, I definitely agree with that. It will be an excellent feature to be supported as part of this utility.
@heitorlessa I brought this issue to roadmap now, Since i believe this might be an interesting use case to support in both python and java version as both has SQS batch processor utility.
Idea here is to provide an easy way where user can configure utility to mark certain failures while processing a SQS message in the batch to be moved to a DLQ.
Initial thought was maybe we could have api accept a list of user supplied exception, which when thrown while processing the message, utility can move those messages to DLQ instead.
Let me know your thoughts?
Just moved to backlog as we should do it. @cakepietoast and I discussed in the original RFC but didn't come around to do it.
That said, @pankajagrawal16 do you wanna create a RFC so we can discuss some implementation details? We can link to this issue, then tackle that until TS and C# work on their core implementations.
This will help @saragerion @sliedig @t1agob to chime in as Exceptions don't translate well in TS, for example.
@heitorlessa Sounds good. I will do that sometime today. I have some ideas on how this can look from UX perspective in Java.
Perfect. Gimme a ping so I don't miss :D I can add the Python bit later
On Tue, 27 Jul 2021 at 09:10, Pankaj Agrawal @.***> wrote:
@heitorlessa https://github.com/heitorlessa Sounds good. I will do that sometime today. I have some ideas on how this can look from UX perspective in Java.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-lambda-powertools-roadmap/issues/21#issuecomment-887268939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZPQBCBTMVZPJBV5YKYAYTTZZLVRANCNFSM4736MPDQ .
@heitorlessa Here you go https://github.com/awslabs/aws-lambda-powertools-roadmap/issues/29
cc @saragerion @cakepietoast as I added all_runtimes tag
@as2d3 this is now available for Java https://awslabs.github.io/aws-lambda-powertools-java/utilities/batch/#move-non-retryable-messages-to-a-dead-letter-queue.
https://github.com/awslabs/aws-lambda-powertools-java/releases/tag/v1.7.3
Try it out and let us know in case of any issues.
Hi @pankajagrawal16,
Thanks for implementing the feature request so quickly. I tried the change recently and have few pointers:
nonRetryableException
list, the exception for it is consumed without any trace for it even when suppressException
is false
. I think there should be a way to know which of the messages moved to DLQ and why. I understand the intention is to safely exit the function if the messages are moved to DLQ or deleted but then we might want to give that data to upstream which is currently not possible.Class<? extends Exception>... nonRetryableExceptions
) is generally not preferred. I understand that this might be done to keep methods backward compatible but a better way would have been to introduce a list of exceptions instead.My usecase: I am invoking my Lambda function directly from my integration tests for both positive happy paths and negative paths as well. For the negative paths, I configured the messages to be completely deleted which is happening as per the requirement. This issue I am facing is that the exception for which the message is deleted is not reported back to the user, and so I am unable to validate the same in my integration tests. The only way the user can validate it is by checking logs manually for the deleted message.
I would love to know your thoughts on this.
Is your feature request related to a problem? Please describe. While processing SQS messages in batch, I want to distinguish between Temporary and Permanent Exceptions so that in case of temporary failures, message return to the main queue and in case of permanent failure, it moves to either DLQ or deleted completely as per the requirement.
Describe the solution you'd like Simple conversion of exception thrown from SqsMessageHandler implementation from their respective exception to TemporaryFailureException or PermanentFailureException with some defined logic. Then handler logic defined as well for those exceptions.
Describe alternatives you've considered Writing my own alternative exception handler which I am still figuring out.
Additional context This seems to me a fairly common use-case and should be a well-understood problem.