Open kvaps opened 3 years ago
Satellite logs do not show anything specific:
13:43:41.042 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - Storage pool 'thindata' updated.
13:43:41.695 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - Storage pool 'thindata' updated.
13:46:21.420 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - Resource 'one-vm-9134-disk-0' created for node 'm10c37'.
13:46:21.420 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - Resource 'one-vm-9134-disk-0' created for node 'm13c16'.
13:46:21.420 [MainWorkerPool-1] INFO LINSTOR/Satellite - SYSTEM - Resource 'one-vm-9134-disk-0' created for node 'm15c5'.
drbd logs:
one-vm-9134-disk-0-m10c37-dmesg.log one-vm-9134-disk-0-m13c16-dmesg.log one-vm-9134-disk-0-m15c5-dmesg.log
one-vm-9120-disk-0-m13c22-dmesg.log one-vm-9120-disk-0-m14c19-dmesg.log one-vm-9120-disk-0-m14c36-dmesg.log
Can you please also show drbdadm status one-vm-9120-disk-0
from i.e. m13c22
?
If that still shows as "Inconsistent", can you try to manually reconnect that one DRBD resource and verify if that starts the sync?
sure from m13c22
:
root@m13c22:~# drbdadm status one-vm-9120-disk-0
one-vm-9120-disk-0 role:Secondary
disk:Inconsistent
m14c19 role:Secondary
replication:SyncTarget peer-disk:UpToDate
m14c36 role:Primary
peer-disk:Diskless
root@m13c22:~# drbdsetup status one-vm-9120-disk-0 --verbose --statistics
one-vm-9120-disk-0 node-id:1 role:Secondary suspended:no
write-ordering:flush
volume:0 minor:2796 disk:Inconsistent quorum:yes
size:41946328 read:28132 written:66812 al-writes:39 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
m14c19 node-id:0 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
volume:0 replication:SyncTarget peer-disk:UpToDate resync-suspended:no
received:10752 sent:0 out-of-sync:0 pending:0 unacked:0
m14c36 node-id:3 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no
received:56060 sent:0 out-of-sync:0 pending:0 unacked:0
from the diskless primary:
root@m14c36:~# drbdadm status one-vm-9120-disk-0
one-vm-9120-disk-0 role:Primary
disk:Diskless
m13c22 role:Secondary
peer-disk:Inconsistent resync-suspended:peer
m14c19 role:Secondary
peer-disk:UpToDate
root@m14c36:~# drbdsetup status one-vm-9120-disk-0 --verbose --statistics
one-vm-9120-disk-0 node-id:3 role:Primary suspended:no
write-ordering:none
volume:0 minor:2796 disk:Diskless client:yes quorum:yes
size:41946328 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
m13c22 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:Inconsistent resync-suspended:peer
received:0 sent:56348 out-of-sync:0 pending:0 unacked:0
m14c19 node-id:0 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:1216 sent:31123276 out-of-sync:0 pending:0 unacked:0
If that still shows as "Inconsistent", can you try to manually reconnect that one DRBD resource and verify if that starts the sync?
First resource was solved by:
root@m13c22:~# drbdadm down one-vm-9120-disk-0
root@m13c22:~# drbdadm up one-vm-9120-disk-0
second resource solved by:
root@m13c16:~# drbdadm disconnect one-vm-9134-disk-0:m10c37
root@m13c16:~# drbdadm connect one-vm-9134-disk-0:m10c37
Both are UpToDate
now
@ghernadi almost all of these resources are having similar error unexpected repl_state (Established) in receive_bitmap
:
[31360.825059] drbd one-vm-8765-disk-0 m15c30: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[31360.825061] drbd one-vm-8765-disk-0/0 drbd1119 m15c30: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[31360.825115] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: strategy = target-clear-bitmap
[31360.825116] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: drbd_sync_handshake:
[31360.825118] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: self FEC0BACF34C1DF98:0000000000000000:40ECA803415CDCE4:0000000000000000 bits:10479414 flags:4
[31360.825120] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: peer B6011241A49F553E:40ECA803415CDCE4:0000000000000000:0000000000000000 bits:10486582 flags:1000
[31360.825121] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: uuid_compare()=target-clear-bitmap by rule 52
[31360.825122] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: Resync source provides bitmap (node_id=3)
[31360.840978] drbd one-vm-8765-disk-0/0 drbd1119: bitmap WRITE of 2241 pages took 16 ms
[31360.840980] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: Becoming WFBitMapT because primary is diskless
[31360.840986] drbd one-vm-8765-disk-0: State change failed: Can not start OV/resync since it is already active
[31360.840988] drbd one-vm-8765-disk-0/0 drbd1119 m15c30: Failed: resync-susp( connection dependency -> no )
[31360.840989] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: Failed: repl( SyncTarget -> WFBitMapT )
[31360.840989] drbd one-vm-8765-disk-0/0 drbd1119 m14c37: ...postponing this until current resync finished
[33858.619064] drbd one-vm-8765-disk-0/0 drbd1119: rs_discard_granularity changed to 524288
[23598.893266] drbd one-vm-8750-disk-0 m10c17: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[23598.893268] drbd one-vm-8750-disk-0/0 drbd1070 m10c17: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[23598.965818] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: Resync done (total 1 sec; paused 0 sec; 37118584 K/sec)
[23598.965825] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: 0 % had equal checksums, eliminated: 22016K; transferred 37096568K total 37118584K
[23598.965827] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: Peer was unstable during resync
[23598.965842] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: repl( SyncTarget -> Established )
[23598.965845] drbd one-vm-8750-disk-0/0 drbd1070 m10c29: resync-susp( connection dependency -> no )
[23598.965848] drbd one-vm-8750-disk-0/0 drbd1070 m10c17: resync-susp( connection dependency -> no )
[23598.965866] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: repl( Established -> WFBitMapT )
[23598.965899] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: helper command: /sbin/drbdadm after-resync-target
[23598.965941] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: Resync done (total 1 sec; paused 0 sec; 37118584 K/sec)
[23598.965948] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: repl( WFBitMapT -> Established )
[23598.974844] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: helper command: /sbin/drbdadm after-resync-target exit code 0
[23598.981438] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 6168(2), total 6168; compression: 99.7%
[23598.981443] drbd one-vm-8750-disk-0/0 drbd1070 m14c37: unexpected repl_state (Established) in receive_bitmap
[26102.846015] drbd one-vm-8750-disk-0/0 drbd1070: rs_discard_granularity changed to 524288
[27344.165002] drbd one-vm-8738-disk-0 m14c42: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[27344.165004] drbd one-vm-8738-disk-0/0 drbd1054 m14c42: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[27344.165104] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: strategy = target-clear-bitmap
[27344.165105] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: drbd_sync_handshake:
[27344.165107] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: self 937E8252872EE878:0000000000000000:63171F02BF6E7AFA:0000000000000000 bits:2395502 flags:4
[27344.165109] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: peer 71C2C524771332C6:63171F02BF6E7AFA:0000000000000000:0000000000000000 bits:2405574 flags:1000
[27344.165110] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: uuid_compare()=target-clear-bitmap by rule 52
[27344.165111] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Resync source provides bitmap (node_id=3)
[27344.168098] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Becoming WFBitMapT because primary is diskless
[27344.168102] drbd one-vm-8738-disk-0: State change failed: Can not start OV/resync since it is already active
[27344.168103] drbd one-vm-8738-disk-0/0 drbd1054 m14c42: Failed: resync-susp( connection dependency -> no )
[27344.168104] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Failed: repl( SyncTarget -> WFBitMapT )
[27344.168105] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: ...postponing this until current resync finished
[27344.169207] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Retrying drbd_rs_del_all() later. refcnt=1
[27344.273317] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Resync done (total 1 sec; paused 0 sec; 9623944 K/sec)
[27344.273319] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: 0 % had equal checksums, eliminated: 19168K; transferred 9604776K total 9623944K
[27344.273319] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Peer was unstable during resync
[27344.273324] drbd one-vm-8738-disk-0/0 drbd1054 m14c42: resync-susp( connection dependency -> no )
[27344.273325] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: repl( SyncTarget -> Established )
[27344.273329] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: repl( Established -> WFBitMapT )
[27344.273336] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: helper command: /sbin/drbdadm after-resync-target
[27344.273353] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: Resync done (total 1 sec; paused 0 sec; 9623944 K/sec)
[27344.273356] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: repl( WFBitMapT -> Established )
[27344.276765] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: helper command: /sbin/drbdadm after-resync-target exit code 0
[27344.279748] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 4688(2), total 4688; compression: 98.6%
[27344.279750] drbd one-vm-8738-disk-0/0 drbd1054 m14c37: unexpected repl_state (Established) in receive_bitmap
[29878.308730] drbd one-vm-8738-disk-0/0 drbd1054: rs_discard_granularity changed to 524288
[34368.492648] drbd one-vm-8737-disk-0 m14c37: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[34368.492654] drbd one-vm-8737-disk-0/0 drbd1053: disk( UpToDate -> Outdated )
[34368.492660] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
[34368.497177] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 1804(1), total 1804; compression: 99.5%
[34368.498432] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 1804(1), total 1804; compression: 99.5%
[34368.498461] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: helper command: /sbin/drbdadm before-resync-target
[34368.507462] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: helper command: /sbin/drbdadm before-resync-target exit code 0
[34368.507513] drbd one-vm-8737-disk-0/0 drbd1053: disk( Outdated -> Inconsistent )
[34368.507517] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: repl( WFBitMapT -> SyncTarget )
[34368.507521] drbd one-vm-8737-disk-0/0 drbd1053 m10c27: resync-susp( no -> connection dependency )
[34368.507605] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Began resync as SyncTarget (will sync 10129080 KB [2532270 bits set]).
[34368.519984] drbd one-vm-8737-disk-0: Preparing cluster-wide state change 498235068 (0->3 499/146)
[34368.552230] drbd one-vm-8737-disk-0: State change 498235068: primary_nodes=8, weak_nodes=FFFFFFFFFFFFFFF4
[34368.552233] drbd one-vm-8737-disk-0: Committing cluster-wide state change 498235068 (32ms)
[34368.552263] drbd one-vm-8737-disk-0 m10c27: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[34368.552265] drbd one-vm-8737-disk-0/0 drbd1053 m10c27: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[34368.552317] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: strategy = target-clear-bitmap
[34368.552318] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: drbd_sync_handshake:
[34368.552321] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: self 3EA929BE52F32716:0000000000000000:833F433ABB25F9D2:0000000000000000 bits:2530418 flags:4
[34368.552323] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: peer 3F407A5F64582A82:833F433ABB25F9D2:0000000000000000:0000000000000000 bits:2532270 flags:1000
[34368.552324] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: uuid_compare()=target-clear-bitmap by rule 52
[34368.552326] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Resync source provides bitmap (node_id=3)
[34368.556545] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Becoming WFBitMapT because primary is diskless
[34368.556549] drbd one-vm-8737-disk-0: State change failed: Can not start OV/resync since it is already active
[34368.556551] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Failed: repl( SyncTarget -> WFBitMapT )
[34368.556552] drbd one-vm-8737-disk-0/0 drbd1053 m10c27: Failed: resync-susp( connection dependency -> no )
[34368.556553] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: ...postponing this until current resync finished
[34368.557710] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Retrying drbd_rs_del_all() later. refcnt=13
[34368.664479] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Resync done (total 1 sec; paused 0 sec; 10134968 K/sec)
[34368.664485] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: 0 % had equal checksums, eliminated: 1724K; transferred 10133244K total 10134968K
[34368.664488] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Peer was unstable during resync
[34368.664503] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: repl( SyncTarget -> Established )
[34368.664507] drbd one-vm-8737-disk-0/0 drbd1053 m10c27: resync-susp( connection dependency -> no )
[34368.664525] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: repl( Established -> WFBitMapT )
[34368.664558] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: helper command: /sbin/drbdadm after-resync-target
[34368.664602] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: Resync done (total 1 sec; paused 0 sec; 10134968 K/sec)
[34368.664609] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: repl( WFBitMapT -> Established )
[34368.674825] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: helper command: /sbin/drbdadm after-resync-target exit code 0
[34368.677924] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 1809(1), total 1809; compression: 99.5%
[34368.677927] drbd one-vm-8737-disk-0/0 drbd1053 m14c37: unexpected repl_state (Established) in receive_bitmap
Hi, I guess this issue is unrelated to https://github.com/LINBIT/linstor-server/issues/216 since it happened on another linstor version on another cluster. Thus I'll prepare new issue.
We had a set of nodes and resources deployed on them. After a while we decided to replace the drives on some nodes. So nodes were switched down and drives were replaces, after booting up, these nodes had empty storage with no created lvm group.
All resources in linstor expectedly become to
Unknown
state:linstor v l
was showing these devices in error state:and bunch of errors in the end:
Then I removed all failed resources by standard way, eg:
And after that, prepared LVM for the storage-pools and created them again:
However created resources are stuck in Inconsistent state:
All nodes have latest DRBD version 9.0.27 and linstor 1.11.0