canonical / microceph

Ceph for a one-rack cluster and appliances
https://snapcraft.io/microceph
GNU Affero General Public License v3.0
193 stars 25 forks source link

False Errors in enabling RGW Services #369

Closed abasu0713 closed 2 weeks ago

abasu0713 commented 2 weeks ago

Issue report

What version of MicroCeph are you using ?

alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ sudo microceph --version
ceph-version: 18.2.0-0ubuntu3~cloud0; microceph-git: 556b907075

Use this section to describe the channel/revision which produces the unexpected behaviour. This information can be fetched from the installed: section of sudo snap info microceph output.

alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ sudo snap info microceph
name:      microceph
summary:   Simple clustered Ceph deployment
publisher: Canonical✓
store-url: https://snapcraft.io/microceph
contact:   https://matrix.to/#/#ubuntu-ceph:matrix.org
license:   AGPL-3.0
description: |
  MicroCeph is the easiest way to get up and running with Ceph. It is focused on providing a modern
  deployment and management experience to Ceph administrators and storage software developers.

  The below commands will set you up with a testing environment on a single machine using
  file-backed OSDs - you'll need about 15 GiB of available space on your root drive:

      sudo snap install microceph
      sudo snap refresh --hold microceph
      sudo microceph cluster bootstrap
      sudo microceph disk add loop,4G,3
      sudo ceph status

  You're done!

  You can remove everything cleanly with:

      sudo snap remove microceph

  To learn more about MicroCeph see the documentation:

  https://canonical-microceph.readthedocs-hosted.com
commands:
  - microceph.ceph
  - microceph
  - microceph.rados
  - microceph.radosgw-admin
  - microceph.rbd
services:
  microceph.daemon: simple, enabled, active
  microceph.mds:    simple, enabled, active
  microceph.mgr:    simple, enabled, active
  microceph.mon:    simple, enabled, active
  microceph.osd:    simple, enabled, active
  microceph.rgw:    simple, enabled, active
snap-id:      ct1DgtIGBaljdOwomQGMLr8EJcL5pOP1
tracking:     reef/edge
refresh-date: today at 11:08 CDT
hold:         forever
channels:
  reef/stable:      18.2.0+snap71f71782c5 2024-05-28  (982) 78MB -
  reef/candidate:   18.2.0+snapcba31e8c75 2024-06-05 (1000) 78MB -
  reef/beta:        18.2.0+snap38155f2c4e 2024-06-07 (1016) 78MB -
  reef/edge:        18.2.0+snap556b907075 2024-06-12 (1030) 77MB -
  latest/stable:    18.2.0+snap71f71782c5 2024-05-28  (982) 78MB -
  latest/candidate: 18.2.0+snapcba31e8c75 2024-06-05 (1000) 78MB -
  latest/beta:      ↑                                            
  latest/edge:      18.2.0+snap556b907075 2024-06-12 (1030) 77MB -
  quincy/stable:    0+git.4a608fc         2024-01-10  (795) 86MB -
  quincy/candidate: 0+git.4a608fc         2023-11-30  (795) 86MB -
  quincy/beta:      ↑                                            
  quincy/edge:      0+git.287ee68         2023-12-05  (809) 86MB -
installed:          18.2.0+snap556b907075            (1030) 77MB held
alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ 

What are the steps to reproduce this issue ?

  1. Create a Multi-node Ceph Cluster using aforementioned Microceph version on any arm64 SBC (like Orange Pi or Raspberry Pi).
  2. Enable RGW Service on the active leader on which we bootstrapped the cluster
  3. Delete the ceph cluster on all nodes using snap --purge.
  4. Scrub RAW devices that were previously used
  5. Re-deploy another ceph cluster using the exact above steps

What happens (observed behaviour) ?

Throws a false error (I am hoping/guessing) in being able to enable the RGW Service.?

alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ sudo microceph enable rgw 
[sudo] password for alphaduriendur: 
Error: failed placing service rgw: failed to add DB record for rgw: failed to record role: This "services" entry already exists
alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ echo $?
1
alphaduriendur@hc-opi3b8-1-arkobasu-space:~$

But the RGW Service is enabled just fine:

alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ sudo microceph.ceph status
  cluster:
    id:     200cd182-ce98-4073-b852-5cabd61f7cf6
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum hc-opi3b8-1-arkobasu-space,hc-opi3b8-2-arkobasu-space,hc-opi3b8-3ret-arkobasu-space (age 58m)
    mgr: hc-opi3b8-1-arkobasu-space(active, since 60m), standbys: hc-opi3b8-2-arkobasu-space, hc-opi3b8-3ret-arkobasu-space
    osd: 9 osds: 9 up (since 45m), 9 in (since 45m)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    pools:   5 pools, 129 pgs
    objects: 193 objects, 582 KiB
    usage:   273 MiB used, 1.4 TiB / 1.4 TiB avail
    pgs:     0.775% pgs not active
             128 active+clean
             1   peering

  io:
    recovery: 36 B/s, 0 objects/s

alphaduriendur@hc-opi3b8-1-arkobasu-space:~$ curl localhost:80
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>alphaduriendur@hc-opi3b8-1-arkobasu-space:~$

I have tested from outside with AWS CLI -> It works. …

What were you expecting to happen ?

Not for that above error. Which seems likely to be a false post clean up exit message/state. The RGW service seems to be fine. I am able to use it as an external-store from a K8s cluster with rook-ceph.

I am not entirely confident if this happened on the first pass! But has happened every time the past 4 times I have used a set of nodes to completely tear down a cluster and rebuild one again. …

Relevant logs, error output, etc.

Logs are provided above.

If it’s considerably long, please paste to https://gist.github.com/ and insert the link here.

Additional comments.

Just checking if this is a false negative.

UtkarshBhatthere commented 2 weeks ago

Thanks for reporting this error @abasu0713 :) I am happy to tell you that a fix is already in pipeline and should be available in reef/edge soon.

abasu0713 commented 2 weeks ago

Thanks for reporting this error @abasu0713 :) I am happy to tell you that a fix is already in pipeline and should be available in reef/edge soon.

Amazing! Thank you. :)