ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.25k stars 536 forks source link

Need a workaround when the ceph-csi rbdplugin pod failed "fsck" on the disk. #1285

Closed yanchicago closed 4 years ago

yanchicago commented 4 years ago

Describe the bug

A clear and concise description of what the bug is.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: '...' rook-ceph cluster is deployed with application using csi volume.
  2. Deployment to trigger the issue '....' docker daemon restarted and the application pod was reschedued. The rescheduled application pod failed to mount the volume.
  3. See error rbdplugin pod logs shows below error messages repeatedly. Complete log is attached.
    
    I0727 17:06:41.056271    9037 utils.go:125] ID: 501047 GRPC response: {"usage":[{"available":3919605760,"total":5150212096,"unit":1,"used":1213829120},{"available":327195,"total":327680,"unit":2,"used":485}]}
    I0727 17:06:53.137613    9037 utils.go:119] ID: 501048 GRPC call: /csi.v1.Node/NodeGetCapabilities
    I0727 17:06:53.137636    9037 utils.go:120] ID: 501048 GRPC request: {}
    I0727 17:06:53.138088    9037 utils.go:125] ID: 501048 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
    I0727 17:06:53.276368    9037 utils.go:119] ID: 501049 GRPC call: /csi.v1.Node/NodeStageVolume
    I0727 17:06:53.276389    9037 utils.go:120] ID: 501049 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3c092a35-b824-46b5-a18f-c1e5db034cfd/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"csireplpool","storage.kubernetes.io/csiProvisionerIdentity":"1591172062273-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-b9a43413-a652-11ea-9a78-7ef490e8cee5"}
    I0727 17:06:53.278296    9037 rbd_util.go:477] ID: 501049 setting disableInUseChecks on rbd volume to: false
    I0727 17:06:53.325286    9037 rbd_util.go:140] ID: 501049 rbd: status csi-vol-b9a43413-a652-11ea-9a78-7ef490e8cee5 using mon 10.254.239.237:6789,10.254.1.104:6789,10.254.229.126:6789, pool csireplpool
    W0727 17:06:53.387644    9037 rbd_util.go:162] ID: 501049 rbd: no watchers on csi-vol-b9a43413-a652-11ea-9a78-7ef490e8cee5
    I0727 17:06:53.387673    9037 rbd_attach.go:202] ID: 501049 rbd: map mon 10.254.239.237:6789,10.254.1.104:6789,10.254.229.126:6789
    I0727 17:06:53.452749    9037 nodeserver.go:147] ID: 501049 rbd image: 0001-0009-rook-ceph-0000000000000001-b9a43413-a652-11ea-9a78-7ef490e8cee5/csireplpool was successfully mapped at /dev/rbd9
    I0727 17:06:53.452874    9037 mount_linux.go:515] Attempting to determine if disk "/dev/rbd9" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/rbd9])
    I0727 17:06:53.462499    9037 mount_linux.go:518] Output: "DEVNAME=/dev/rbd9\nTYPE=ext4\n", err: <nil>
    I0727 17:06:53.462524    9037 mount_linux.go:441] Checking for issues with fsck on disk: /dev/rbd9
    E0727 17:06:53.487198    9037 nodeserver.go:345] ID: 501049 failed to mount device path (/dev/rbd9) to staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3c092a35-b824-46b5-a18f-c1e5db034cfd/globalmount/0001-0009-rook-ceph-0000000000000001-b9a43413-a652-11ea-9a78-7ef490e8cee5) for volume (0001-0009-rook-ceph-0000000000000001-b9a43413-a652-11ea-9a78-7ef490e8cee5) error 'fsck' found errors on device /dev/rbd9 but could not correct them: fsck from util-linux 2.23.2
    /dev/rbd9: Superblock needs_recovery flag is clear, but journal has data.
    /dev/rbd9: Run journal anyway

/dev/rbd9: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) . E0727 17:06:53.565721 9037 utils.go:123] ID: 501049 GRPC error: rpc error: code = Internal desc = 'fsck' found errors on device /dev/rbd9 but could not correct them: fsck from util-linux 2.23.2 /dev/rbd9: Superblock needs_recovery flag is clear, but journal has data. /dev/rbd9: Run journal anyway

/dev/rbd9: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) .



# Actual results #
The rescheduled app pod is in "ContainerCreating" state failing to mount the volume. 

Describe what happened

# Expected behavior #
1. Should be in "Running" state with PVC attached. 
1. The PVC is not mounted or mapped. Can't run "fsck" manually. 

A clear and concise description of what you expected to happen.

# Logs #

If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.

- csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the
  provisioner pod.

If the issue is in PVC resize please attach complete logs of below containers.

- csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the
  provisioner pod.

If the issue is in snapshot creation and deletion please attach complete logs
of below containers.

- csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the
  provisioner pod.

If the issue is in PVC mounting please attach complete logs of below containers.

- csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
  plugin pod from the node where the mount is failing.
[rook-logs07-27-18-45-rbdplugin.txt](https://github.com/ceph/ceph-csi/files/4985299/rook-logs07-27-18-45-rbdplugin.txt)

- if required attach dmesg logs.

**Note:-** If its a rbd issue please provide only rbd related logs, if its a
cephfs issue please provide cephfs logs.

# Additional context #

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior
Madhu-1 commented 4 years ago

can you please try with the latest ceph-csi version and see the issue exists?

yanchicago commented 4 years ago

@Madhu-1 Thanks for your quick response. This is in the field. Could you help to see if there's any workaround so the manual fsck can run to recover the pod?

Madhu-1 commented 4 years ago

@Madhu-1 Thanks for your quick response. This is in the field. Could you help to see if there's any workaround so the manual fsck can run to recover the pod?

we do remove the map if the mounting fails, one thing you can try is, map this PVC on another node manually and run fsck command.

yanchicago commented 4 years ago

@Madhu-1 Could you point the source code so we can find the exact command for mapping the PVC?

Madhu-1 commented 4 years ago

here it is https://github.com/ceph/ceph-csi/blob/v1.2.1/pkg/rbd/rbd_attach.go#L199-L220

yanchicago commented 4 years ago

Could you provide an example mapping command?

Madhu-1 commented 4 years ago

rbd map / -m --user <username>--keyfile <keyfile>

yanchicago commented 4 years ago

The code reads "rbd --id cr.id -m mon:port --keyfile cr.Keyfile map pool/rbd_image". How to retrieve cr.id, cr.keyfile?

yanchicago commented 4 years ago

The key file is the "ca.crt"? What's the --user or --id ?

# knc get secret rook-csi-rbd-plugin-sa-token-bqr6l -o yaml 
apiVersion: v1
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURjakNDQWxxZ0F3SUJBZ0lJUmhGTlg5T01LMVV3RFFZSktvWklodmNOQVFFTEJRQXdVREVMTUFrR0ExVUUKQmhNQ1FVRXhDekFKQmdOVkJBZ01Ba0ZCTVFzd0NRWURWUVFIREFKQlFURUxNQWtHQTFVRUNnd0NRVUV4Q3pBSgpCZ05WQkFzTUFrRkJNUTB3Q3dZRFZRUUREQVJDUTAxVU1CNFhEVEl3TURReU16QXlNVGt4TVZvWERUSXlNRGN5Ck5UQXlNVGt4TVZvd1VERUxNQWtHQTFVRUJoTUNRVUV4Q3pBSkJnTlZCQWdNQWtGQk1Rc3dDUVlEVlFRSERBSkIKUVRFTE1Ba0dBMVVFQ2d3Q1FVRXhDekFKQmdOVkJBc01Ba0ZCTVEwd0N3WURWUVFEREFSQ1EwMVVNSUlCSWpBTgpCZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFyRFg1cDFhNS9RV0cwSlBxNHdUWnBHbURab1EzCjBrQzlCcDNheW9YcTkzRjdzY0ttK2dqZXNvUlRHU1lOZHo3THliVDlUM0FHdEI2eGdoN3NNVWppcGRCU3JaQVgKOXdwL1NiK2lSL0RLcjlSbWw3Y2Rua0ZqNFhyNzJMMTBPbmR1U05zSWFWckYwVE5MSlM0VHV5b3Vma0tkLzhxQwpqK3JDSExJUEwwWDBmcWZCeHE3WnUxZkJIQjRKOXB6V3J4RUsvQnJ3bGY3bGdERE93S3kxZlE4cWk2OVpDSHRmCmRNWFVsUDZnb2oxOVg3VlZLWGMzWk9LSVJ1NFN2ZkxURmZTVE5Uamt6UWxBc0ZHcjl0RVIzVnpLdjNxQlRoN2wKK1p2ZEpvbHlFdlI1WXMzUEludzRqVnZuSk1BTUpDS3JZOTM1YnZTY2hEVDVDUkNzYWRPREZZZmcxUUlEQVFBQgpvMUF3VGpBTUJnTlZIUk1FQlRBREFRSC9NQjBHQTFVZERnUVdCQlFzUmFRTzRUa243NThzMTlET01BbTlwaWo0CmpqQWZCZ05WSFNNRUdEQVdnQlFzUmFRTzRUa243NThzMTlET01BbTlwaWo0ampBTkJna3Foa2lHOXcwQkFRc0YKQUFPQ0FRRUFjbTBPVWtLNWZIaEczNzFrdkxSM1U3RlNnUmd5OHV5SkxqL0JYZ3Uyc3RvandwU0hDWXdwYUVHTgo2TGc5VTUveXRCVk9pS1IxeXc4d2xJNFIvN0xJVGQ1cUQzNk96TVZWZmlUbzZSdmZORWpyUVpNR3J5dnZCNjk2Cm9TVmwybmttUlgzbnhlOStlWDl5MjVXb2UweUpmUXIySGROc3ZvTG10UHEyYkw1Mlg1OFpnb0ZGU2Q4Y0o1U2EKUG03TmlSbUpmMG9tMFdqdmlPRVNoMUJzalBZNEhNQ2tiRUpvZElMalczWWp5Y3phU3VHUXdwRFFMQUJQc1VCUgpJWXFBUUNjRzE5TzlucFVyUDZPbjR5MlRERTlIb25hQ3I3T1VsOEhSQlZxRXFLYUhjdmpraStxQWpsalVWSnJ3ClJYUVBBaGxWSlZlVjE4U21xT0hwRTcwME1DdXZlQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  namespace: cm9vay1jZXBo
  token: ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklqYzBiMVV4TTFsdE1XNUdiRnBHWVRKMWVEWmxTM1pNVkRScmFtVnBWMGczZGpWWE9GUlRWVEV0TXpBaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUp5YjI5ckxXTmxjR2dpTENKcmRXSmxjbTVsZEdWekxtbHZMM05sY25acFkyVmhZMk52ZFc1MEwzTmxZM0psZEM1dVlXMWxJam9pY205dmF5MWpjMmt0Y21Ka0xYQnNkV2RwYmkxellTMTBiMnRsYmkxaWNYSTJiQ0lzSW10MVltVnlibVYwWlhNdWFXOHZjMlZ5ZG1salpXRmpZMjkxYm5RdmMyVnlkbWxqWlMxaFkyTnZkVzUwTG01aGJXVWlPaUp5YjI5ckxXTnphUzF5WW1RdGNHeDFaMmx1TFhOaElpd2lhM1ZpWlhKdVpYUmxjeTVwYnk5elpYSjJhV05sWVdOamIzVnVkQzl6WlhKMmFXTmxMV0ZqWTI5MWJuUXVkV2xrSWpvaU9USTFZakk1WmpjdFlUaG1NeTAwTUdJNUxXSXlOR0l0Tm1KalpUWTVZV001TWpNNElpd2ljM1ZpSWpvaWMzbHpkR1Z0T25ObGNuWnBZMlZoWTJOdmRXNTBPbkp2YjJzdFkyVndhRHB5YjI5ckxXTnphUzF5WW1RdGNHeDFaMmx1TFhOaEluMC5LMDc1VjVhbmhyVTYwb2VtaHlCakRybDZjbzNYV3RCUlFVd0I4YjN3bUEyVExPVE4xdWMzVllsWUxPa05OLWhqSUN0RkRyd0pMaFo0NkttdjNJYjNybHJIdWNBR25VekFBZDNTTlJMLUNZRTIySnRRLXpHREljTTltQUVMS1FyR29GeFNORDJUUHF5UWFtbEpxaUd0M2lCRm5XckpWdGE5dXdvNGx5MXBTZE5hQUcxcHZzM3NFd1FCRm55ME9rSUdTMlNhWFp6MGdSWWFXUkdfQ2Z2WDdWTTZ1WFFCcVNydUJVVGJSV0ZiWHBJU1hqcW94dEotNnM3dXlYREFJRF9PaXVxbUFlOXFtbVgwS2hGVl91NkV2RDRuX2FmcXRRN2x1SmpYdlBBVGxPejY3ck1CUlBMYXQwcnJxQS1wLXBCa1pCcXRWMW1MQmY0dFBfaUdLdTYzY0E=
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: rook-csi-rbd-plugin-sa
    kubernetes.io/service-account.uid: 925b29f7-a8f3-40b9-b24b-6bce69ac9238
  creationTimestamp: "2020-07-27T22:26:13Z"
  name: rook-csi-rbd-plugin-sa-token-bqr6l
  namespace: rook-ceph
  resourceVersion: "56411039"
  selfLink: /api/v1/namespaces/rook-ceph/secrets/rook-csi-rbd-plugin-sa-token-bqr6l
  uid: 1845ada1-226c-43c0-a322-8e8b1db32dac
type: kubernetes.io/service-account-token
yanchicago commented 4 years ago

Tried with un-decoded ca.crt or decoded ca.crt. neither works.

#  rbd map csireplpool/0001-0009-rook-ceph-0000000000000001-724ee4bc-d06b-11ea-860b-3654d631dc71 -m 0.254.209.61:6789,10.254.243.208:6789,10.254.63.3:6789 --user rook-csi-rbd-plugin-sa  --keyfile ~/ca.crt
rbd: failed to get secret
2020-07-28 03:28:11.241 7fb6b7daab00 -1 auth: failed to decode key 'LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURjakNDQWxxZ0F3SUJBZ0lJUmhGTlg5T01LMVV3RFFZSktvWklodmNOQVFFTEJRQXdVREVMTUFrR0ExVUUKQmhNQ1FVRXhDekFKQmdOVkJBZ01Ba0ZCTVFzd0NRWURWUVFIREFKQlFURUxNQWtHQTFVRUNnd0NRVUV4Q3pBSgpCZ05WQkFzTUFrRkJNUTB3Q3dZRFZRUUREQVJDUTAxVU1CNFhEVEl3TURReU16QXlNVGt4TVZvWERUSXlNRGN5Ck5UQXlNVGt4TVZvd1VERUxNQWtHQTFVRUJoTUNRVUV4Q3pBSkJnTlZCQWdNQWtGQk1Rc3dDUVlEVlFRSERBSkIKUVRFTE1Ba0dBMVVFQ2d3Q1FVRXhDekFKQmdOVkJBc01Ba0ZCTVEwd0N3WURWUVFEREFSQ1EwMVVNSUlCSWpBTgpCZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFyRFg1cDFhNS9RV0cwSlBxNHdUWnBHbURab1EzCjBrQzlCcDNheW9YcTkzRjdzY0ttK2dqZXNvUlRHU1lOZHo3THliVDlUM0FHdEI2eGdoN3NNVWppcGRCU3JaQVgKOXdwL1NiK2lSL0RLcjlSbWw3Y2Rua0ZqNFhyNzJMMTBPbmR1U05zSWFWckYwVE5MSlM0VHV5b3Vma0tkLzhxQwpqK3JDSExJUEwwWDBmcWZCeHE3WnUxZkJIQjRKOXB6V3J4RUsvQnJ3bGY3bGdERE93S3kxZlE4cWk2OVpDSHRmCmRNWFVsUDZnb2oxOVg3VlZLWGMzWk9LSVJ1NFN2ZkxURmZTVE5Uamt6UWxBc0ZHcjl0RVIzVnpLdjNxQlRoN2wKK1p2ZEpvbHlFdlI1WXMzUEludzRqVnZuSk1BTUpDS3JZOTM1YnZTY2hEVDVDUkNzYWRPREZZZmcxUUlEQVFBQgpvMUF3VGpBTUJnTlZIUk1FQlRBREFRSC9NQjBHQTFVZERnUVdCQlFzUmFRTzRUa243NThzMTlET01BbTlwaWo0CmpqQWZCZ05WSFNNRUdEQVdnQlFzUmFRTzRUa243NThzMTlET01BbTlwaWo0ampBTkJna3Foa2lHOXcwQkFRc0YKQUFPQ0FRRUFjbTBPVWtLNWZIaEczNzFrdkxSM1U3RlNnUmd5OHV5SkxqL0JYZ3Uyc3RvandwU0hDWXdwYUVHTgo2TGc5VTUveXRCVk9pS1IxeXc4d2xJNFIvN0xJVGQ1cUQzNk96TVZWZmlUbzZSdmZORWpyUVpNR3J5dnZCNjk2Cm9TVmwybmttUlgzbnhlOStlWDl5MjVXb2UweUpmUXIySGROc3ZvTG10UHEyYkw1Mlg1OFpnb0ZGU2Q4Y0o1U2EKUG03TmlSbUpmMG9tMFdqdmlPRVNoMUJzalBZNEhNQ2tiRUpvZElMalczWWp5Y3phU3VHUXdwRFFMQUJQc1VCUgpJWXFBUUNjRzE5TzlucFVyUDZPbjR5MlRERTlIb25hQ3I3T1VsOEhSQlZxRXFLYUhjdmpraStxQWpsalVWSnJ3ClJYUVBBaGxWSlZlVjE4U21xT0hwRTcwME1DdXZlQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
danielzhanghl commented 4 years ago

try to use id "admin" and get the key file in manager pod, /etc/ceph/ceph.client.admin.keyring

danielzhanghl commented 4 years ago

guess the volume is attached to the node already, could go to the node and repair the device?

yanchicago commented 4 years ago

@danielzhanghl The volume is not mapped, fsck failed then the volume is unmapped. So there's no device exists for manual repair. @Madhu-1 Really appreciate if you can put a little more details on how to get the id and key file and use it in the command. 1). Is it "--id" or "--user". 2). The keyflie format? Should it be decoded or keep as it is as secret.

Madhu-1 commented 4 years ago

already provided that info on slack channel https://rook-io.slack.com/archives/CG3HUV94J/p1595907147301200?thread_ts=1595878052.299900&cid=CG3HUV94J or run ceph auth ls in the toolbox pod and use admin creds from the ceph cluster

Madhu-1 commented 4 years ago
kubectl get secrets rook-csi-rbd-node -oyaml -nrook-ceph
apiVersion: v1
data:
  userID: Y3NpLXJiZC1ub2Rl
  userKey: QVFDdWxCOWZjUzJFRVJBQUs1UHNYcDN3M1JFbUhrcnNGbDYyMXc9PQ==
kind: Secret
metadata:
  creationTimestamp: "2020-07-28T02:59:58Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:userID: {}
        f:userKey: {}
      f:metadata:
        f:ownerReferences:
          .: {}
          k:{"uid":"4079b742-de47-4ce2-b091-9300c8e997ff"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:type: {}
    manager: rook
    operation: Update
    time: "2020-07-28T02:59:58Z"
  name: rook-csi-rbd-node
  namespace: rook-ceph
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: rook-ceph
    uid: 4079b742-de47-4ce2-b091-9300c8e997ff
  resourceVersion: "281811"
  selfLink: /api/v1/namespaces/rook-ceph/secrets/rook-csi-rbd-node
  uid: 3380f875-bb42-41a7-b51e-ccff7e59b686
type: kubernetes.io/rook
[🎩︎]mrajanna@localhost rbd $]echo Y3NpLXJiZC1ub2Rl|base64 -d
csi-rbd-node
[🎩︎]mrajanna@localhost rbd $]echo QVFDdWxCOWZjUzJFRVJBQUs1UHNYcDN3M1JFbUhrcnNGbDYyMXc9PQ==|base64 -d
AQCulB9fcS2EERAAK5PsXp3w3REmHkrsFl621w==

use the decoded value for --user and --keyring

yanchicago commented 4 years ago

Many thanks for your support. :+1:We were able to recover the pod. Could you shed some light on the how the IP address was selected for the "watcher"? We've a k8s cluster using Calico CNI via IPIP encapulate mode. There're two subnets among all hosts. And the watcher IP seems randomly allocated between the two subnets. Is the watcher IP used in any way? Do you see any issues with this type of IP config?

Madhu-1 commented 4 years ago

Many thanks for your support. 👍We were able to recover the pod. Could you shed some light on the how the IP address was selected for the "watcher"? We've a k8s cluster using Calico CNI via IPIP encapulate mode. There're two subnets among all hosts. And the watcher IP seems randomly allocated between the two subnets. Is the watcher IP used in any way? Do you see any issues with this type of IP config?

This is not something cephcsi can handle, it's better to check with ceph or rook team. Closing this issue as its fixed.

yanchicago commented 4 years ago

@Madhu-1 Could you please shed some light on how this can happen frequently in our site?

Madhu-1 commented 4 years ago

@yanchicago have you tried with the latest ceph-csi?

@nixpanic @humblec any idea?

yanchicago commented 4 years ago

@Madhu-1 Unfortunately, this is in the field, can't upgrade at will.

cl51287 commented 3 years ago

@Madhu-1 Our version is 3.2.1, this problem still exists. This problem must be fixed manually.

cl51287 commented 3 years ago

@Madhu-1 Is this problem fixed?

Madhu-1 commented 3 years ago

@cl51287 I haven't come across this problem @nixpanic @humblec any idea?. Do you have a set of steps to reproduce this one?

humblec commented 3 years ago

@cl51287 can you give more details about the issue please. Are you also using calico in your setup ? and whats the issue faced at your end ? do you come across the fsck errors ? if yes, when exactly it happens ? only when rbdplugin restarts in between? The ceph client connections are established on an IP and it has been tracked for client request and completion ...etc. If the client IP get changed frequently or while in use , I could expect some problems. But, I would like to confirm whether hostnetworking is enabled for CSI pods in this setup or not? also in this calico setup, any changes to host IPs happen ?

cl51287 commented 3 years ago

Our error is the same as yanchicago, as follows: `May 14 10:27:00 k8s-test-0-114 kubelet: E0514 10:27:00.723771 3657 csi_attacher.go:320] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = 'fsck' found errors on device /dev/rbd3 but could not correct them: fsck from util-linux 2.32.1 May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3 contains a file system with errors, check forced. May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3: Inode 23 has an invalid extent node (blk 63775, lblk 5509)

May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.`

The above error occurred after the physical machine was down. At the same time, a hard disk on this physical machine also appeared in this situation, which needs fsck to repair. We use the kernel mode to mount, rbdplugin restart will not appear this situation (at least we have not encountered until now) this situation so far we have only encountered it once, and we have not reproduced it in the follow-up. We did not use calico. Csi pod hostNetwork is true.

humblec commented 3 years ago

Our error is the same as yanchicago, as follows: `May 14 10:27:00 k8s-test-0-114 kubelet: E0514 10:27:00.723771 3657 csi_attacher.go:320] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc = 'fsck' found errors on device /dev/rbd3 but could not correct them: fsck from util-linux 2.32.1 May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3 contains a file system with errors, check forced. May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3: Inode 23 has an invalid extent node (blk 63775, lblk 5509)

May 14 10:27:00 k8s-test-0-114 kubelet: /dev/rbd3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.`

The above error occurred after the physical machine was down. At the same time, a hard disk on this physical machine also appeared in this situation, which needs fsck to repair.

This is expected! Any journalling filesystem when it goes through incomplete transactions like this at situation like node shutdown, its expected or bound to happen.

We use the kernel mode to mount, rbdplugin restart will not appear this situation (at least we have not encountered until now) this situation so far we have only encountered it once, and we have not reproduced it in the follow-up. We did not use calico. Csi pod hostNetwork is true.

This case is different or purely an illustration of node down scenario. I dont think any thing we can do from CSI side to cover this. More or less, this is working from filesystem point of view as expected/designed. !!

cl51287 commented 3 years ago

Yes, if the system goes down, it is expected that fsck needs to be repaired. But whether this can be automatically repaired in the csi, I see that the csi has been repaired in the log, but the repair failed. Because if this happens in the production environment, business programs will not be automatically restored, and all programs on this machine need to be repaired manually.

humblec commented 3 years ago

Yes, if the system goes down, it is expected that fsck needs to be repaired. But whether this can be automatically repaired in the csi, I see that the csi has been repaired in the log, but the repair failed. Because if this happens in the production environment, business programs will not be automatically restored, and all programs on this machine need to be repaired manually.

At time of mounting kube libraries ( csi triggers the mount though) does/attempt the fsck operation, why it didnt go through or failed to repair I am not sure. more or less, it has been attempted in general way by the kube mounters and havent seen instances where fsck also fails. It seems that, the situation is severe filesystem corruption as it was not just node down scenario rather the hard disk was also in trouble here. The mount libraries wont perform the operations which are beyond the default as in this case. It may require some force options to be supplied for the repair options which by default the program like mounters stay away to avoid causing more damage.