dgraph-io / dgraph

The high-performance database for modern applications
https://dgraph.io
Other
20.46k stars 1.5k forks source link

Issue Connecting Dgraph Alpha Instance to Zero Leader Across Different Servers #9198

Closed sivkr closed 1 month ago

sivkr commented 1 month ago

Question.

What I want to do I have set up Dgraph Zero and Dgraph Alpha on one instance (xxx.xxx.x.27), and another Dgraph Alpha instance on a different server (xxx.xxx.x.28). I want to connect the Alpha instance on the second server (xxx.xxx.x.28) to the Zero service running on the first server (xxx.xxx.x.27:5080).

What I did

1.Set up Dgraph Zero and Alpha** on the instance xxx.xxx.x.27. 2.Zero is configured to listen on xxx.xxx.x.27:5080.

  1. Alpha is configured to connect to the Zero instance at xxx.xxx.x.27:5080. 4.Set up another Dgraph Alpha** instance on a separate server (xxx.xxx.x.28), configured to connect to the Zero instance on xxx.xxx.x.27:5080.

Zero Service Configuration (xxx.xxx.x.27):

ExecStart=/usr/local/bin/dgraph zero --my=xxx.xxx.x.27:5080 --replicas=1 --wal /var/lib/dgraph/zw

Alpha Service Configuration on another VM(xxx.xxx.x.28):;

ExecStart=/usr/local/bin/dgraph alpha --my=xxx.xxx.x.28:7080 --zero=xxx.xxx.x.27:5080 --logtostderr -v=2 -p /var/lib/dgraph/p -w /var/lib/dgraph/w --port_offset=8180

Error : When trying to connect the Alpha instance from xxx.xxx.x.28 to Zero at xxx.xxx.x.27:5080, I get the following error:

Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.738624 536182 groups.go:750] Found connection to leader: localhost:5080 Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.739088 536182 groups.go:704] No healthy Zero leader found. Trying to find a Zero leader… Oct 14 11:10:54 AI-ML18 dgraph[536169]: E1014 11:10:54.752493 536182 groups.go:1229] Error during SubscribeForUpdates for prefix “\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00”: unable to find any servers for group: 1. closer err: Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.840109 536182 run.go:786] Caught Ctrl-C. Terminating now (this may take a few seconds)… Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.851876 536182 run.go:791] Stopped before initialization completed Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.854678 536182 groups.go:750] Found connection to leader: localhost:5080 Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.854742 536182 groups.go:704] No healthy Zero leader found. Trying to find a Zero leader…

What could be causing the Alpha instance on xxx.xxx.x.28 to not find a healthy Zero leader, even though it connects? Are there any specific configurations or settings I should adjust to ensure proper connection and healthy leader discovery between Alpha on xxx.xxx.x.28 and Zero on xxx.xxx.x.27? Do I need to configure anything on the Zero instance to support connecting Alphas from multiple servers?

Can anyone help me resolve this issue?

rarvikar commented 1 month ago

@sivkr , the issue here is with the value specified for the --port_offset flag. The value should be something like 2 or 3 and must complement the value specified for the --my flag for the Alpha service to bind to.

If the value of 8180 is specified, then the Alpha port specified using --my must have a value of 7080+8180 which is 15280 . So we must use --my=xxx.xxx.xxx.27:15260 and not --my=xxx.xxx.xxx.27:7080 .

BUT more importantly, the port_offset flag is not needed if the Alpha was setup on a dedicated host, where there is no potential port conflict on port 7080 due to another Alpha service running.

Can you let me know how is the second Alpha node being setup (same host or different host) ?

If on a different host, please remove the --port—offset flag unless there is a conflict on port 7080. If on the same host, then specify a value of 15260 for the port for the --my flag and re-try.

All of this is adequately documented here: https://dgraph.io/docs/deploy/security/ports-usage/

Port Offset To make it easier for users to set up a cluster, Dgraph has default values for the ports used by Dgraph nodes. To support multiple nodes running on a single machine or VM, you can set a node to use different ports using an offset (using the command option --port_offset). This command increments the actual ports used by the node by the offset value provided. You can also use port offsets when starting multiple Dgraph Zero nodes in a development environment. For example, when a user runs Dgraph Alpha with the --port_offset 2 setting, then the Alpha node binds to port 7082 (gRPC-internal-private), 8082 (HTTP-external-public) and 9082 (gRPC-external-public), respectively.

Thanks!

rarvikar commented 1 month ago

Since this is a configuration problem and not a bug, I'll be closing this issue as of now. But please feel free to re-open, if any problems arise even after making the configuration changes recommended above.

Thanks and best luck!