Open bording opened 7 years ago
ASB will raise a critical error and it's up to the endpoint host to decide what to do.
As already mentioned in the original PR SQL will trigger circuit breaker and if the problem persists raise critical error.
BTW, should the question here be more: "Should Core enforce behavior on the transport when onError throws"? (to that my answer would be 'no')
I'd regard onError
as something that the core should guarantee to never fail unless something catastrophic has occurred. It follows from that the transport should not swallow the exception, but let it propagate right out.
As already mentioned in the original PR SQL will trigger circuit breaker and if the problem persists raise critical error.
@tmasternak Looking at the code it appears that SQL skips the circuit breaker and directly raises the critical error, just like MSMQ: https://github.com/Particular/NServiceBus.SqlServer/blob/develop/src/NServiceBus.SqlServer/Receiving/ReceiveStrategy.cs#L79
ASB will raise a critical error and it's up to the endpoint host to decide what to do.
@SeanFeldman I took a look at the ASB code, and it appears that it actually does arm the circuitbreaker, and not raise a critical error directly:
@bording we arm the circuit breaker upon unrecoverable errors. When CB is armed and triggered, the registered trigger action is to raise a critical error
@SeanFeldman Right, the point I'm trying to make is that you arm the circuitbreaker. You aren't raising the critical error directly like SQL and MSMQ are.
Ah, yes. That is correct @bording .
I'd regard onError as something that the core should guarantee to never fail unless something catastrophic has occurred. It follows from that the transport should not swallow the exception, but let it propagate right out.
@adamralph From what I understand the scenarios that could throw are:
Case 1 and 2 could be transient for most of the transports since there is a connection involved, so trying the message again could successfully send it if the connection has been reestablished.
Case 3 is the hard one, because it could either be transient or it's bad code and will always throw.
@tmasternak I like your suggestion for a clearer title.
Ultimately, I wanted to show that currently we have lots of different behavior here. Is this something that is OK, or should there be consistent behavior that is enforced via a TransportTest?
I'm leaning toward:
consistent behavior that is enforced via a TransportTest
@tmasternak Looking at the code it appears that SQL skips the circuit breaker and directly raises the critical error, just like MSMQ: https://github.com/Particular/NServiceBus.SqlServer/blob/develop/src/NServiceBus.SqlServer/Receiving/ReceiveStrategy.cs#L79
You are correct, my bad.
Just to add to the discussion. AFIAK the intention of CriticalError
was to enable transport to rais unrecoverable errors to the Core
. If my understanding is correct then if and what a transport regards as a unrecoverable error depends on concrete transport and probably should not be enforced. E.g. in Sql we made a decision that persistent problems with connection to the broker (db instance) should result in endpoint shutdown. I am not sure this can be said about all transports that we have.
@bording given the changes in v7 around the critical error behavior, it does seem like raising a critical error seems like a reasonable action? Can we create a new issue with some actionable tasks and close this discussion or is there further need for discussion?
@timbussmann It's not immediately apparent to me how the v7 changes make any difference here.
It's been a while, but looking back over this issue, the main thing that stands out to me is deciding if onError
throwing should be considering a catastrophic, stop the world failure or not.
Based on https://github.com/Particular/NServiceBus/issues/4568#issuecomment-288853679, I'm not sure that treating any and all failures here as unrecoverable is the correct approach.
Of course, given that the default v7 behavior for a critical error action is to do nothing, I suppose we need to take another look at the behavior of the transports in that light. Perhaps they aren't so different then?
throwing should be considering a catastrophic, stop the world failure or not.
I meant that with the v7 changes it has become more clear that we should never stop/shutdown on our behalf.
am I missing something?
Talking to @timbussmann we think it would make sense to enforce this via a test and have RabbitMQ+ASQ raise a critical error since that would give the users a chance to react to this situation? (We assume that RabbitMq+ASQ currently just rolls the message back to the queue)
Note: In v7 the default critical error behavior is to just retry indefinitely
As brought up in the comments to #4561 starting here, there is a difference in how the various transports handle the scenario of the
onError
func throwing an exception.Currently, MSMQ and SQL raise a critical error, but RabbitMQ and ASQ don't (not sure about what ASB does, but I'm guessing it doesn't either).
CC: @Particular/azure-maintainers @Particular/rabbitmq-transport-maintainers @Particular/sqlserver-transport-maintainers