Open daniva6 opened 1 month ago
following timeout error when trying to add a new node: 'Error: Post "http://control.socket/cluster/control": context deadline exceeded'
Did you get anywhere with this issue? I have the same problem.
Did you get anywhere with this issue? I have the same problem.
Hi NIck, unfortunately not. If someone knows how update the 'mon host =' statement with the correct ip's, that might solve the problem.
The mon host =
statement is populated using the mon.host.$hostname
config entries from MicroCeph's internal DQlite table.
With a little bit of SQL magic you can remove and insert new entries in the table, using which MicroCeph will repopulate the conf file (in a few mins).
See config 3 below and the ceph.conf file.
$ sudo microceph cluster sql "select * from config"
+----+----------------------+------------------------------------------+
| id | key | value |
+----+----------------------+------------------------------------------+
| 1 | fsid | a307994e-03ed-4122-9ca3-3bb289af9665 |
| 2 | keyring.client.admin | AQAsCBpnPTElMRAA1CRvvsceRWWdm8f/SByOJw== |
| 3* | mon.host.workbook | 192.168.29.152 |
| 4 | public_network | 192.168.29.152/24 |
+----+----------------------+------------------------------------------+
$ pwd
/var/snap/microceph/current/conf
$ cat ceph.conf
# # Generated by MicroCeph, DO NOT EDIT.
[global]
run dir = /var/snap/microceph/current/run
fsid = a307994e-03ed-4122-9ca3-3bb289af9665
mon host = 192.168.29.152
public_network = 192.168.29.152/24
auth allow insecure global id reclaim = false
ms bind ipv4 = true
ms bind ipv6 = false
@daniva6 @nickwales can you please try the above method (read Hack) and see if that solves it for you ?
@sabaini #446
@UtkarshBhatthere I applied the hack and was able to remove the non-existing nodes. Unfortunately trying to add a new node results now in a timeout error. The microceph commands seem to work (cluster list, status, disk list) but the ceph command hangs, and the node can not bring up the osds anymore.
Is there a possibility to extract data from the disks? Or to import an osd in another cluster?
Hey @daniva6 can you please provide a bit more information 1. sudo microceph cluster sql "select * from config"
, 2. ceph mon dump
, and 3. Hostnames and IP address for member nodes to compare what goes where.
@UtkarshBhatthere I did set up a new ceph cluster and had to delete the old one for space reasons - this was a good test for my backups :)
Issue report
What version of MicroCeph are you using ?
18.2.4 reef (stable)
Use this section to describe the channel/revision which produces the unexpected behaviour.
I had to forcefully remove a node due to a hardware failure. Afterwards I wanted to join a new node with the previous node name and ip address, unfortunately it run in a timeout. Now I'm getting the following error: Error: failed to record mon db entries: failed to record mon host: This "config" entry already exists I then tried to add a new node with a new ip address and name, unfortunately same result.
What are the steps to reproduce this issue ?
What happens (observed behaviour) ?
Error: failed to record mon db entries: failed to record mon host: This "config" entry already exists …
What were you expecting to happen ?
new node should have joined the cluster …
Relevant logs, error output, etc.
If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.
Additional comments.
I've realized that in the ceph.conf entry 'mon host = ' only the ip of the new hosts are present, the ip's of the two currently operating nodes are missing …