Open mullermp opened 2 years ago
Also reported in the SDK for Java repo: https://github.com/aws/aws-sdk-java-v2/issues/3173.
This is a very serious issue. The functionality does not match the documentation https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-putevents.html#eb-failure-handling.
I don't expect the whole call to fail, just to have the response entries populate correctly so I can detect the error. Instead the JSON response from the service is {"Entries":[{"EventId":"e1797b69-dcd6-f7e1-36a0-7f6e80bdce98"}],"FailedEntryCount":0}
The service never returns an error state at all. How is my code supposed to know that this event was never actually published? If publishing to a non existent bus "succeeds", what other potential errors are not being returned to the caller?
This pretty much rules out our ability to use eventbridge at all for our applications.
@mullermp @debora-ito @ajredniwja Can we get some insight here? This is a pretty major issue.
(x-posting from https://github.com/aws/aws-sdk-ruby/issues/2657#issuecomment-1227229808)
@mullermp @debora-ito We are encountering this issue on a large implementation. We have concerns that our AWS API client code is returning false-positive success and there's no way to detect failure and retry/alarm.
Can you ask the service team or EventBridge product owner to consider the following approach? I believe this would put the EventBridge API closer in line with the SNS API:
putEvents
API sends a success response but increments FailedEntryCount
. Similar to SNS:PublishBatch
API in that it responds HTTP 200 but the response payload indicates what failed. I think you'd probably need an array of Failed
events in the response so that clients can manage those.putEvent
API that accepts a single event and responds HTTP 5xx on service failure. Similar to the SNS:Publish
APIThis ensures support for existing client batch behavior while adding support for client-side failure handling.
@kylejw2, @jwicks, @jasongerard, and everyone impacted by this:
We got an statement from the EventBridge team, quote :-
EventBridge (formerly CloudWatch Events) has supported putting events on event buses since 2016. Ever since, we have returned 200 responses when calling PutEvents API on a non-existent bus. We are investigating a change in this behaviour but have to ensure all existing customer applications are not affected. As ever, any unauthorized, unauthenticated, or invalid calls will continue to fail.
Sincerely, The EventBridge team
We don't have a timeline to share, but I'll keep pushing the EventBridge Team to moving forward with a fix for this behavior. Will post here any updates we have.
Thank you @debora-ito for the update. How will the EventBridge PutEvents API respond if the bus name is valid but there is an internal service error in processing the event?
@jwicks Will forward your question to the EventBridge team.
This is a major issue that is also affecting me at the moment. Just leaving my 2c for the EventBridge team to be aware that this is affecting many people trying to have confidence in EventBridge.
Same error here. How to trust that the event was sent successfully if in cases where the name of the bus does not exist, the return is successful. This problem is also affecting me.
Adding my frustrations here: we are still experiencing the same problem with the Ruby SDK after more than 1 year since the bug was reported in multiple issues.. reference 1, reference 2, and reference 3
Bumping this again since our team is encountering the same issue. We were surprised when our localstack integration tests started passing without ever creating the bus we're putting events to.
Thanks everyone on this thread for reporting and apology for lack of update here.
As I just checked in with the service team, this is being actively discussed and worked on resolving the behavior. (ref: P70611272, V1268450858)
Please understand that this needs to be fixed by the service team and we (SDK team) don't have much control over it, however, I'll make sure to keep checking in here with updates.
Did we ever get a conclusion on this issue , its quite frustrating, especially if you're doing direct service to service intergration with the api gateway , you are left with no chance at error handling or or responding to clients with an genuine failure response
Is there any update on this @aBurmeseDev ? Ensuring that a message has been published is of critical important to an event-based system.
X-post from https://github.com/aws/aws-sdk-ruby/issues/2657
Description:
I'm using the SDK to put events on an event bus that I've created. When I instantiate the EventBridge client, I can get a list of all available event buses in my account. I noticed an issue that EventBridge isn't reporting errors that I would expect it to report. For example, I changed the event bus name to one that didn't show up in the list of event buses I pulled. When I executed the put_events method for a nonexistent event bus, I received a success message and no error. I looked at the source code for the put_events command and I couldn't find any issues with it. I think this is probably an error on the AWS api. Receiving a success response when I know my event fell off the radar seems like buggy behavior.
Internal TT created with EventBridge. OSDS can follow up.