AxonFramework / extension-springcloud

Axon Framework extension for Spring Cloud's Discovery mechanism integration to distribute Command messages.
https://axoniq.io/
Apache License 2.0
26 stars 16 forks source link

Instance without Command Handlers cannot be discovered #327

Closed kagof closed 10 months ago

kagof commented 10 months ago

Basic information

Steps to reproduce

Have an application using the SpringCloudCommandRouter with the RestCapabilityDiscoveryMode which sends commands, but does not have any command handlers.

Expected behaviour

I'd expect this class to be able to send commands to other applications registered with the SpringCloudCommandRouter.

Actual behaviour

We get a "No node known to accept command" error when attempting to send any command from the application with no handlers.

We see the following message in our logs:

o.a.e.s.c.m.RestCapabilityDiscoveryMode : Failed to receive the capabilities from ServiceInstance [<application name>] under host [<ip address>] and port [<port number>]. Will temporarily set this instance to deny all incoming messages.

for each instance the application finds through service discovery. This explains why it cannot send commands; the application has an exception when attempting to discover capabilities on any other application, and so records them as not having any capabilities.

Enabling debug logging allows us to see the stacktrace from the RestCapabilityDiscoveryMode:

RestCapabilityDiscoveryMode  : Service Instance [<instance info>] is denying all messages due to the following exception: 

java.lang.NullPointerException: null
    at org.axonframework.extensions.springcloud.commandhandling.mode.RestCapabilityDiscoveryMode.isLocalServiceInstance(RestCapabilityDiscoveryMode.java:123)
    at org.axonframework.extensions.springcloud.commandhandling.mode.RestCapabilityDiscoveryMode.requestMessageRoutingInformation(RestCapabilityDiscoveryMode.java:107)
    at org.axonframework.extensions.springcloud.commandhandling.mode.RestCapabilityDiscoveryMode.capabilities(RestCapabilityDiscoveryMode.java:91)
    at org.axonframework.extensions.springcloud.commandhandling.SpringCloudCommandRouter.updateMemberships(SpringCloudCommandRouter.java:223)
    at org.axonframework.extensions.springcloud.commandhandling.SpringCloudCommandRouter.resetLocalMembership(SpringCloudCommandRouter.java:183)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
...

According to the stacktrace the error appears to be coming from this method:

    private boolean isLocalServiceInstance(ServiceInstance serviceInstance) {
        return Objects.equals(serviceInstance, localInstance.get())
                || Objects.equals(serviceInstance.getUri(), localInstance.get().getUri());
    }

Debugging through it confirms that localInstance is an empty AtomicReference, and so the NPE comes from calling .get().getUri() on it.

Extra info

It looks like the RestCapabilityDiscoveryMode expects AbstractCapabilityDiscoveryMode#updateLocalCapabilities to be called at least once, as it has the side effect of setting the reference for the localInstance. But it seems that since the localCapabilities are already correct by default if there are no command handlers on the local instance, that method never gets called and thus the side effect never occurs.

This appears to be very similar to #1; maybe you could call it a regression, although that issue was for the old Metadata based approach, so that's maybe unfair.

Workarounds

Adding a class like this to the application:

@Service
public class FooCommandHandler {
    @CommandHandler
    public void handle(final Foo command) {}

    public static class Foo {

    }
}

causes the updateLocalCapabilities method to be called and gets rid of the NullPointerException thus allowing commands to be routed properly.

A (slightly?) more graceful workaround we've found is to just call

commandBus.updateLoadFactor(commandBus.getLoadFactor());

in the method where we configure the DistributedCommandBus bean, which also forces the updateLocalCapabilities method to be called and allows command routing.

smcvb commented 10 months ago

Fix

Thanks for this detailed bug description, @kagof!

You have definitely found a misser on our end here. And a scenario that should be applicable, namely services that only dispatch commands but don't handle any.

Checking the implementation, I think we would also be saved by setting a fixed NoOpServiceInstance in the constructor of the AbstractCapabilityDiscoveryMode. As long as we ensure it never matches with the ServiceInstance passed during the CapabilityDiscoveryMode#capabilities(ServiceInstance) call, we should be good.

What's your thought on that?

Release

Although small, I do think the fix would merit a release. Especially since the traffic on our extensions is low, such an issue with a workaround is still worthy of a new release I think. However, we are currently on Axon Framework 4.9.0 and Axon Spring Cloud Extension 4.9.0.

If I would release a fix in, let's say, 4.6.1 of the Spring Cloud Extension, we'd be required to release a 4.7.1, 4.8.1, and 4.9.1 too. To safe us (read: the framework team) some work, it would be awesome if you'd be helped with a 4.9.1 release of this.

So, if possible, I'd like your 2-cents on that. Thanks in advance! :pray:

kagof commented 10 months ago

Hey @smcvb , thanks for the speedy reply. Your proposed solution makes sense to me!

I do understand not wanting to manage a backport of this fix all the way back to 4.6, I'm sure that's a lot of extra effort manage 4 versions instead of just 1.

Since the workaround we mentioned above is quite simple (just calling commandBus#updateLoadFactor at startup time) and seems to work, I think it is pretty reasonable to just keep the fix in the 4.9.x stream. For now, we'll just keep the workaround in place until we're able to upgrade to 4.9 and pick up the more elegant solution.

smcvb commented 10 months ago

Thanks for the feedback, @kagof! I went ahead and made the fix today, and merged it already :-) A release will follow soon.

Thanks again for bringing this to our attention!!

kagof commented 9 months ago

Much appreciated, thanks for resolving this so quickly @smcvb 🚀