canonical / microceph

Ceph for a one-rack cluster and appliances
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
193 stars 25 forks source link

Microceph not starting after upgrade to reef/stable #342

Open usma0118 opened 2 months ago

usma0118 commented 2 months ago

Issue report

After upgrading from snap 707 to 975, i am getting error failed to initialize trust store.

I have looked into issue reported #336 and followed copied values to trust store as defined in #269. but still fails.

What version of MicroCeph are you using ?

18.2.0+snap450240f5dd (Single node)

Use this section to describe the channel/revision which produces the unexpected behavior. This information can be fetched from the installed: section of sudo snap info microceph output.

What are the steps to reproduce this issue ?

Upgraded from snap version 707 to 975

What happens (observed behaviour) ?

Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2178]: Error: Unable to start daemon: Daemon failed to start: Failed to initialize trust store: Failed to parse local record "". Found empty certificate
Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2359]: time="2024-03-01T20:27:02+01:00" level=info msg="Daemon stopped"
Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2359]: Error: Unable to start daemon: Daemon failed to start: Failed to initialize trust store: Failed to parse local record "". Found empty certificate
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Main process exited, code=exited, status=1/FAILURE
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Mar 01 20:27:02 antaresinc-cluster microceph.mds[1408]: starting mds.antaresinc-cluster at
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Scheduled restart job, restart counter is at 5.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: Stopped snap.microceph.daemon.service - Service for snap application microceph.daemon.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Start request repeated too quickly.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: Failed to start snap.microceph.daemon.service - Service for snap application microceph.daemon.

What were you expecting to happen ?

Relevant logs, error output, etc.

Mar 03 01:01:59 antaresinc-cluster audit[2422]: AVC apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.osd" pid=2422 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.298:190): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.osd" pid=2422 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:01:59 antaresinc-cluster audit[3590529]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590529]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.438:191): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.438:192): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590542]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.642:193): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.642:194): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590542]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[1410]: AVC apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.mon" pid=1410 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.126:195): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.mon" pid=1410 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:02:00 antaresinc-cluster audit[3590559]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590559]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.198:196): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.198:197): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590561]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590561]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.374:198): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.374:199): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 05:01:38 antaresinc-cluster microceph.mgr[1409]: 2024-03-03T05:01:38.591+0100 7f5e60947640 -1 mgr handle_mgr_map I was active but no longer am

Additional comments.

usma0118 commented 2 months ago

Related #219

usma0118 commented 2 months ago

@UtkarshBhatthere any help?

UtkarshBhatthere commented 2 months ago

Thanks for sharing this issue @usma0118. We will take a look at it. I also do not think this is related to #219. That was a simpler command timeout that happened if the bootstrap process was slow.

usma0118 commented 2 months ago

App armor settings:

x profiles are in enforce mode.

  snap.microceph.ceph
   snap.microceph.daemon
   snap.microceph.hook.install
   snap.microceph.hook.post-refresh
   snap.microceph.mds
   snap.microceph.mgr
   snap.microceph.microceph
   snap.microceph.mon
   snap.microceph.osd
   snap.microceph.rados
   snap.microceph.radosgw-admin
   snap.microceph.rbd
   snap.microceph.rgw

Processes are in enforce mode.

   /snap/microceph/975/bin/ceph-mds (14118) snap.microceph.mds
   /snap/microceph/975/bin/ceph-mgr (14119) snap.microceph.mgr
   /usr/bin/dash (14191) snap.microceph.osd
   /snap/microceph/975/bin/ceph-osd (14222) snap.microceph.osd
usma0118 commented 2 months ago

Another observation, truststore after upgrade was empty, had to manually create cluster.yaml

UtkarshBhatthere commented 2 months ago

yes, the empty truststore was an old issue (which you possibly observed due to upgrade from an older revision). You should not see this going forward since the fix has been merged in microcluster and we have refreshed our dependencies.

usma0118 commented 2 months ago

After manually fixing truststore, I am can see microceph services running.

but ceph status gives timeout. any ideas?

UtkarshBhatthere commented 2 months ago

Are the required ceph config files present in the /var/snap/microceph path ?

UtkarshBhatthere commented 2 months ago

Also @usma0118 please feel free to discuss this directly in our Matrix Room if there is a bit of to and fro required.

usma0118 commented 1 month ago

Had to manually create config files. after that microceph services are started.

UtkarshBhatthere commented 1 month ago

it would be awesome if you could share a bit about what config files you had to create to get it working.

usma0118 commented 1 month ago

difficult to remember but based on command history:

/var/snap/microceph/common/state/truststore/ (Culster file)

symlink /var/snap/microceph//current/conf/ceph.client.admin.keyring and /var/snap/microceph//current/conf/ceph.conf to required places (can't recall which)