Closed jonathansnell closed 2 months ago
Since it looks like the problem here is that the nodes are defaulting to their public IP addresses. I tried forcing the initial cluster boostrap to use the private address space with the following command:
$ sudo microceph cluster bootstrap --microceph-ip 10.24.0.10 --mon-ip 10.24.0.10 --public-network 10.24.0.0/24 --cluster-network 10.24.0.0/24
This completes without error and yields the following:
$ sudo microceph status
MicroCeph deployment summary:
- node-1 (10.24.0.10)
Services: mds, mgr, mon
Disks: 0
And I can see that the microceph daemon is bound to the private IP address:
$ ss -tunl | grep 7443
tcp LISTEN 0 4096 10.24.0.10:7443 0.0.0.0:*
However, there doesn't appear to be a way to specify the IP address that microceph on the joining node should bind to, resulting in the following status after the join:
$ sudo microceph status
MicroCeph deployment summary:
- node-1 (10.24.0.10)
Services: mds, mgr, mon
Disks: 0
- node-2 ($NODE_2_PUBLIC_IP)
Services: mds, mgr, mon
Disks: 0
And I can see that microceph has bound to the public IP address on this node:
$ ss -tunl | grep 7443
tcp LISTEN 0 4096 $NODE_2_PUBLIC_IP:7443 0.0.0.0:*
I believe I could work around the issue above, if I could force the joining node to bind to the private address space and advertise itself as such, but there doesn't seem to be a flag for this that I can identify in the documentation.
@masnax can you please take a look the the comment above ? It appears that the joining nodes do not strictly follow the subnet of microceph-ip
parameter passed to the microcluster.App.NewCluster()
call, In the output shared by @jonathansnell, the joining node binds to the Public-IP available on the node (which is not ideal where network isolation is required) while the bootstrapped node conforms to the provided microceph-ip
.
@jonathansnell the fix for configuring microceph IP on cluster join was merged.
Marking this issue as closed, please feel free to reopen this if this doesn't workout for you still (fix should be available in reef
)
Issue report
What version of MicroCeph are you using ?
18.2.0+snap0a1f14ce2a from latest/stable
What are the steps to reproduce this issue ?
sudo microceph cluster bootstrap
sudo microceph cluster add node-2
sudo microceph cluster join $JOIN_KEY
What happens (observed behaviour) ?
The node joins the cluster with the following error:
Error: failed to generate the configuration: failed to locate IP on public network $NODE_1_PUBLIC_IP/32: no IP belongs to provided subnet $NODE_1_PUBLIC_IP/32
Running
sudo microceph cluster list
will then list both nodes as expected, however operations such assudo microceph disk add /dev/sdX --wipe
result in the following error:What were you expecting to happen ?
I expect the node to join without error and the disk to be added to node-2 without error.
Relevant logs, error output, etc.
Additional comments.
Although the error appears to suggest that the public IP for node-1 cannot be reached by node-2, and looks like a routing error, this is not the case. Observe the following ping output: