apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.11k stars 1.11k forks source link

Unable to attach Ceph as primary storage #8405

Closed li-liwen closed 11 months ago

li-liwen commented 11 months ago
ISSUE TYPE
COMPONENT NAME
UI
Primary storage
CLOUDSTACK VERSION
4.18.1.0
CONFIGURATION

Advanced networking

OS / ENVIRONMENT

I am using Ubuntu server 22.04 as KVM hypervisor (Libvirt 8.0.0, QEMU 6.2.0) with ceph (18.2.1) installed on all hypervisors for a hyper-converged setup. The ceph cluster was bootstrapped by Cephadm. The ceph cluster contains three separated monitor nodes , and five KVM+Ceph nodes. In the five KVM+Ceph nodes, two of them also function as monitor nodes.

SUMMARY

Cannot add ceph cluster as primary storage through Web UI. cloudstack-ceph-error I am also running a Proxmox cluster and I am able to add the ceph cluster in Proxmox with the same user and keys. In Proxmox, it works as expected.

STEPS TO REPRODUCE
  1. Install Ubuntu server 22.04 LTS on all machines.
  2. Install cloudstack 4.18.1.0 management server as official document.
  3. Bootstrape a Ceph Reef cluster through Cephadm (I have tried to install Cephadm with apt/curl, same result).
  4. Install KVM and prepare networking on all KVM hosts.
  5. Create cloudstack pool and init rbd
    ceph osd pool create cloudstack
    rbd pool init cloudstack
  6. Create cloudstack user
    ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'

Additionally, I also tried propagate the ceph.conf file and keyrings to each hosts but not working (even with admin keys).

I have read the thread #6463 and tried the following ceph configuration commands, but still no progress:

ceph config set mon auth_expose_insecure_global_id_reclaim false
ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
ceph config set mon auth_allow_insecure_global_id_reclaim false

The KVM hosts can use the ceph cluster directly if provided the admin keyring. I used the following command:

qemu-img create -f rbd rbd:cloudstack/new-libvirt-image 2G
EXPECTED RESULTS
The ceph cluster is added to cloudstack successfully.
ACTUAL RESULTS

This is the cloudstack troubleshooting log right after I click the OK button in adding storage:

2023-12-24 16:13:16,511 DEBUG [c.c.a.t.Request] (AgentManager-Handler-11:null) (logid:) Seq 7-789818784650103765: Processing:  { Ans: , MgmtId: 206863094307331, via: 7, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":"false","details":"com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,511 DEBUG [c.c.a.m.AgentManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Details from executing class com.cloud.agent.api.ModifyStoragePoolCommand: com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,512 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) alertType=[7] dataCenterId=[1] podId=[1] clusterId=[null] message=[Unable to attach storage pool33 to the host7].
2023-12-24 16:13:16,517 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) No recipients set in global setting 'alert.email.addresses', skipping sending alert with subject [Unable to attach storage pool33 to the host7] and content [Unable to attach storage pool33 to the host7].
2023-12-24 16:13:16,518 WARN  [o.a.c.s.d.l.CloudStackPrimaryDataStoreLifeCycleImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Unable to establish a connection between Host {"id":7,"name":"kvm-192-168-13-13","type":"Routing","uuid":"ef3f53cb-14fc-4eb4-a8ea-23a4896e973f"} and {"name":"ceph","uuid":"bb8473b7-f815-3d24-9bfd-885edcfd229b"}
com.cloud.utils.exception.CloudRuntimeException: Unable establish connection from storage head to storage pool 33 due to com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,606 DEBUG [c.c.a.t.Request] (AgentManager-Handler-6:null) (logid:) Seq 10-4530902700111432125: Processing:  { Ans: , MgmtId: 206863094307331, via: 10, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":"false","details":"com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,606 DEBUG [c.c.a.m.AgentManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Details from executing class com.cloud.agent.api.ModifyStoragePoolCommand: com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,607 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) alertType=[7] dataCenterId=[1] podId=[1] clusterId=[null] message=[Unable to attach storage pool33 to the host10].
2023-12-24 16:13:16,611 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) No recipients set in global setting 'alert.email.addresses', skipping sending alert with subject [Unable to attach storage pool33 to the host10] and content [Unable to attach storage pool33 to the host10].
2023-12-24 16:13:16,612 WARN  [o.a.c.s.d.l.CloudStackPrimaryDataStoreLifeCycleImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Unable to establish a connection between Host {"id":10,"name":"kvm-192-168-13-14","type":"Routing","uuid":"9f9f655a-fef4-49d7-aafa-900cabd51315"} and {"name":"ceph","uuid":"bb8473b7-f815-3d24-9bfd-885edcfd229b"}
com.cloud.utils.exception.CloudRuntimeException: Unable establish connection from storage head to storage pool 33 due to com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,698 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 11-7044474242137589771: Processing:  { Ans: , MgmtId: 206863094307331, via: 11, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":"false","details":"com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,698 DEBUG [c.c.a.m.AgentManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Details from executing class com.cloud.agent.api.ModifyStoragePoolCommand: com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,699 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) alertType=[7] dataCenterId=[1] podId=[1] clusterId=[null] message=[Unable to attach storage pool33 to the host11].
2023-12-24 16:13:16,703 WARN  [c.c.a.AlertManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) No recipients set in global setting 'alert.email.addresses', skipping sending alert with subject [Unable to attach storage pool33 to the host11] and content [Unable to attach storage pool33 to the host11].
2023-12-24 16:13:16,704 WARN  [o.a.c.s.d.l.CloudStackPrimaryDataStoreLifeCycleImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Unable to establish a connection between Host {"id":11,"name":"kvm-192-168-13-15","type":"Routing","uuid":"f6018446-c9dd-428e-95db-f5b397ce9e37"} and {"name":"ceph","uuid":"bb8473b7-f815-3d24-9bfd-885edcfd229b"}
com.cloud.utils.exception.CloudRuntimeException: Unable establish connection from storage head to storage pool 33 due to com.cloud.utils.exception.CloudRuntimeException: Failed to create storage pool: bb8473b7-f815-3d24-9bfd-885edcfd229b
2023-12-24 16:13:16,704 WARN  [o.a.c.s.d.l.CloudStackPrimaryDataStoreLifeCycleImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) No host can access storage pool {"name":"ceph","uuid":"bb8473b7-f815-3d24-9bfd-885edcfd229b"} on cluster 1
2023-12-24 16:13:16,708 DEBUG [c.c.s.StorageManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Failed to add data store: Failed to access storage pool
com.cloud.utils.exception.CloudRuntimeException: Failed to access storage pool
2023-12-24 16:13:16,710 DEBUG [c.c.s.StorageManagerImpl] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Failed to clean up storage pool: null
2023-12-24 16:13:16,710 INFO  [c.c.a.ApiServer] (qtp1239807799-4738:ctx-491707c7 ctx-ddc41096) (logid:8a783143) Failed to add data store: Failed to access storage pool
boring-cyborg[bot] commented 11 months ago

Thanks for opening your first issue here! Be sure to follow the issue template!

kiwiflyer commented 11 months ago

Can your ACS mgmt servers ping your ceph monitors? If I recall correctly, when you first add a new ceph cluster, the ACS MGMT servers are involved directly, after that initial primary storage provisioning, the ACS agent on the KVM host creates all the images. So, if your managed servers can't reach your Ceph mons, you might need to temporarily route the networks to establish the new primary storage.

li-liwen commented 11 months ago

Thanks for the quick reply! However, the management server actually do have connections to the ceph clusters. I have tried install the Ceph-common package, ceph.conf, and keyring on the management server as well, but still no progress. Here is the output from the management server diagnosing the connection (192.168.13.251 is one ceph monitor server):

username@cloudstack:~$ ping 192.168.13.251
PING 192.168.13.251 (192.168.13.251) 56(84) bytes of data.
64 bytes from 192.168.13.251: icmp_seq=1 ttl=63 time=0.255 ms
64 bytes from 192.168.13.251: icmp_seq=2 ttl=63 time=0.239 ms
^C
--- 192.168.13.251 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 0.239/0.247/0.255/0.008 ms
username@cloudstack:~$ telnet 192.168.13.251 6789
Trying 192.168.13.251...
Connected to 192.168.13.251.
Escape character is '^]'.
??H??v027???

     ^]
telnet> Connection closed.
username@cloudstack:~$ telnet 192.168.13.251 3300
Trying 192.168.13.251...
Connected to 192.168.13.251.
Escape character is '^]'.
ceph v2
^]
telnet> Connection closed.
li-liwen commented 11 months ago

It turns out that I didn't to install the RBD driver for KVM. I am able to resolve the problem by installing the driver:

sudo apt-get install libvirt-daemon-driver-storage-rbd

Closing this issue...