Eventuous / eventuous

Event Sourcing library for .NET
https://eventuous.dev
Apache License 2.0
442 stars 70 forks source link

Health check stays "healthy" even if Event Handler throws an exception #306

Closed PehrGit closed 7 months ago

PehrGit commented 7 months ago

Describe the bug I am trying to get the health checks working, as described here: https://eventuous.dev/docs/subscriptions/subs-diagnostics/#health-checks.

I have added the required code but when an exception occurs, the health status remains "healthy".

To Reproduce I have modified the dotnet-sample repo to always throw an exception on the RoomBooked event. Steps to reproduce the behavior:

  1. Clone my fork https://github.com/PehrGit/dotnet-sample-healthchecks
  2. Run docker-compose up
  3. Run the application
  4. Navigate to https://[localhost:12110/hc](https://localhost:12110/hc)
  5. Notice the state is "Healthy"
  6. Navigate to https://localhost:12110/swagger/index.html
  7. Execute the POST /booking/book to add a new booking
  8. Notice in the console output of the application, that an exception occurs.
  9. Go back to https://[localhost:12110/hc](https://localhost:12110/hc)
  10. Note the state is still "Healthy"

Expected behavior I expect the state to become "Unhealthy" because the subscription has been dropped. I have set breakpoints in the SubscriptionHealthCheck class and I see that ReportHealthy is called at the time of Subscribe, but the ReportUnhealthy is never called.

Screenshots image

Desktop (please complete the following information):

Additional context

Log output:

[16:27:34 DBG] ChannelInfo being produced...        
[16:27:34 INF] Selected Unspecified/localhost:12113.
[16:27:34 DBG] ChannelInfo being produced...        
[16:27:34 INF] Selected Unspecified/localhost:12113.
[16:27:34 WRN] OpenTelemetry LoggerProvider was not found in application services. Logging will remain disabled.
[16:27:34 DBG] Starting subscription BookingsProjections                                                        
[16:27:34 INF] [BookingsProjections] Loaded checkpoint BookingsProjections from MongoCheckpointStore: Checkpoint { Id = BookingsProjections, Position = 39899, IsEmpty = False }
[16:27:35 DBG] ChannelInfo produced!
[16:27:35 DBG] ChannelInfo produced!
[16:27:35 INF] [BookingsProjections] Started                               
[16:27:35 DBG] Subscription 076f3591-d346-4195-a761-997fe92aaae6 confirmed.
[16:27:35 INF] Started subscription BookingsProjections                    
[16:27:35 DBG] Starting subscription PaymentIntegration                    
[16:27:35 VRB] Subscription 076f3591-d346-4195-a761-997fe92aaae6 received event Booking-string13@0 C:40731/P:40731
[16:27:35 INF] [PaymentIntegration] Started                                              
[16:27:35 DBG] Persistent Subscription PaymentsIntegration::PaymentIntegration confirmed.
[16:27:35 INF] Started subscription PaymentIntegration                                   
[16:27:35 VRB] [BookingsProjections] Received V1.RoomBooked from Booking-string13:40731 seq 0
[16:27:35 INF] Now listening on: https://localhost:12110
[16:27:35 INF] Now listening on: http://localhost:12111
[16:27:35 INF] Application started. Press Ctrl+C to shut down.
[16:27:35 INF] Hosting environment: Development
[16:27:35 INF] Content root path: C:\tfs\eventuous-dotnet-sample\Bookings
[16:27:35 ERR] [BookingsProjections] Message handling failed at BookingStateProjection for message 300fa451-412f-4bfa-bbda-1f9b7edec298
System.Exception: Demonstration exception
   at Bookings.Application.Queries.BookingStateProjection.HandleRoomBooked(IMessageConsumeContext`1 ctx, UpdateDefinitionBuilder`1 update) in C:\tfs\eventuous-dotnet-sample\Bookings\Application\Queries\BookingStateProjection.cs:line 33
   at Eventuous.Projections.MongoDB.MongoOperationBuilder`2.UpdateBuilder`1.<>c__DisplayClass11_0.<UpdateFromContext>b__0(IMessageConsumeContext`1 ctx, UpdateDefinitionBuilder`1 update)
   at Eventuous.Projections.MongoDB.MongoOperationBuilder`2.UpdateBuilder`1.GetUpdateWithOptions(IMessageConsumeContext`1 ctx)
   at Eventuous.Projections.MongoDB.MongoOperationBuilder`2.UpdateOneBuilder.<Eventuous.Projections.MongoDB.MongoOperationBuilder<TEvent,T>.IMongoProjectorBuilder.Build>b__3_0(MessageConsumeContext`1 ctx, IMongoCollection`1 collection, CancellationToken token)
   at Eventuous.Projections.MongoDB.MongoProjector`1.HandleEvent(IMessageConsumeContext context)
   at Eventuous.Subscriptions.TracedEventHandler.HandleEvent(IMessageConsumeContext context)
   at Eventuous.Subscriptions.Consumers.DefaultConsumer.<>c__DisplayClass2_0.<<Consume>g__Handle|0>d.MoveNext()
[16:27:35 VRB] [BookingsProjections] MyBookingsProjection handled V1.RoomBooked Booking-string13:40731 seq 0
[16:27:35 ERR] [BookingsProjections] Message handling failed at AsyncHandlingFilter for message 300fa451-412f-4bfa-bbda-1f9b7edec298
System.Exception: Demonstration exception
   at Eventuous.Subscriptions.Filters.AsyncHandlingFilter.DelayedConsume(WorkerTask workerTask, CancellationToken ct)
[16:27:35 WRN] [BookingsProjections] Message V1.RoomBooked not acknowledged at 40731
System.Exception: Demonstration exception
   at Eventuous.Subscriptions.Filters.AsyncHandlingFilter.DelayedConsume(WorkerTask workerTask, CancellationToken ct)

To make matters even stranger, I noticed that the breakpoint in SubscriptionHealthCheck.ReportUnhealthy(), which I am expecting to be hit, will be hit when I leave the debugger hanging on the exception for 20 seconds or more, but not before that. Videos showing the behaviour:

https://github.com/Eventuous/eventuous/assets/3782086/5f3fadab-bef6-482d-ab3e-9b291b406839

https://github.com/Eventuous/eventuous/assets/3782086/16772e99-61cf-4a10-882e-238b6f91c44f

alexeyzimarev commented 7 months ago

If you let the app run after throwing, it will report unhealthy right away. Because you hold the execution, it drops on timeout (20 sec) and then the function to report unhealthy sub is called. When you hold there, the exception is not bubbled up to the sub, and it doesn't crash the event handling callback, so it just waits. Then, it times out and drops.

alexeyzimarev commented 7 months ago

I'd expect you to see log messages from this:

        Log.WarnLog?.Log(exception, "Dropped: {Reason}", reason);

It's the second line of the Dropped function, which is called on subscription drop.

alexeyzimarev commented 7 months ago

I was also writing wrong about resubscribe. Until it actually resubscribes, the subscription should remain dropped.

alexeyzimarev commented 7 months ago

@PehrGit shall we conclude this as resolved since I merged your PR?

PehrGit commented 7 months ago

Yes let's do that. My initial assumption, that an exception thrown during handling should make the check unhealthy, was incorrect. As you pointed out in the Discord, the health check is intended to ensure connectivity to the database. To verify that events are being handled, we should use the metrics (see #309).

Thanks for assisting and clarifying that!