EventStore / EventStore-Client-Go

Go Client for Event Store version 20 and above.
Apache License 2.0
103 stars 25 forks source link

issue: Fail to create Persistent Subscription on multi-node deployment #177

Closed seanppayne closed 3 months ago

seanppayne commented 3 months ago

Relevant ESDB thread: https://discuss.eventstore.com/t/persistentsubscription-issues-with-multi-node-deployment/5268

[ErrorCodeNotLeader] the request needing a leader node was executed on a follower node

When I try to create a persistent subscription locally with docker, it works fine. However, when I try to connect to a cluster that I have deployed in staging I receive the above error.

We are using an ESDB cluster with DNS. Is there anything specific in terms of configuration that I could be missing which is causing this to not find the leader node? I have RequiresLeader set to true in the go client library.

ex.

func (s *subscription) CreateSubscription(ctx context.Context, resultPrefix string) error {
    options := esdb.PersistentAllSubscriptionOptions{
        Filter: &esdb.SubscriptionFilter{
            Type:     esdb.StreamFilterType,
            Prefixes: []string{resultPrefix},
        },
        RequiresLeader: true,
    }

    return s.EventStore.client.CreatePersistentSubscriptionToAll(ctx, s.SubscriptionGroup, options)
}

I was able to get this working by setting the Node preference via:

settings.NodePreference = esdb.NodePreferenceLeader
client, err := esdb.NewClient(settings)

However, I am not certain this would work in 100% of cases since this is a 'preference' which I'm assuming is not a guarantee. And it seems like the default behavior for persistent subscriptions should be to connect to the leader rather than requiring the user to set these values.

ylorph commented 3 months ago

@seanppayne : how does your connection string looks like for the cluster ? esdb:// or esdb+discover:// ? and what do you pass as address after the // ?

YoEight commented 3 months ago

Hey @seanppayne!

I'm not convinced this is a bug because we have tests for this exact scenario, and they pass without issues. As @ylorph mentioned earlier, could you please share the connection string you used? I'm fairly confident the issue stems from there.

The NotLeaderException is not a bug; it indicates that you're trying to run an operation requiring a leader node but targeted a follower node instead.

Assuming your connection string is correct based on your ESDB cluster configuration, it's still possible that when the client discovered the leader of the cluster, that node moved to a follower state. This is a rare but possible occurrence.

If you continuously experience this issue, you might have set up your connection string to always connect to the same node, which happens to be a follower each time. In that case, using esdb+discover:// or specifying all the nodes in your connection string (esdb://{node1_host}:{node1_port},{node2_host}:{node2_port},{node3_host}:{node3_port}) should resolve the issue.

Instead of stopping your application, if you retry the operation with the same client handle, the operation should proceed successfully.

If my earlier assumptions are incorrect, it might be a bug in the Go client where the client disregards the leader node during the discovery process due to incorrect implementation. A similar issue occurred in the Java client and was fixed here: https://github.com/EventStore/EventStoreDB-Client-Java/pull/261

seanppayne commented 3 months ago

Hey guys, thanks for getting back to me. I apologize because I asked right before my vacation. I believe you're right and we are not using the discover url.

I will confirm in a few days when I am back in office and let you know if that resolves the issue.

seanppayne commented 3 months ago

We confirmed that we were not using the discover url and that this works as expected. We can find the leader node automatically now. Thank you for your help!