Closed clincha closed 1 year ago
Kasten looks like a good product that connects with BackBlaze. I can use it to set backup policies for my Persistent Volumes
Valero also looks really good. Not sure if it can do persistent volume backups though
You need to break a few eggs before you can make omelettes. I have saved the Factorio configuration and game saves. Let's delete everything on Proxmox and then get started on the Ceph cluster.
apt install ceph-mgr-dashboard
ceph mgr module enable dashboard
Then I hit this error:
Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement
So I repeated step 1 on the other nodes as well. Then I did step 2 again.
I made a certificate:
ceph dashboard create-self-signed-cert
I made a user:
vi password
ceph dashboard ac-user-create clincha -i password administrator
URL is available at port 8443
I've created 4 pools
ERASURE_HDD
ERASURE_SSD
REPLICATED_HDD
REPLICATED_SSD
I needed to do some magic to make it work correctly in Proxmox. First of all I tried using the ERASURE_SSD as new storage in Proxmox but it threw this error:
TASK ERROR: unable to create VM 100 - rbd create 'vm-100-disk-0' error: 2023-05-04T16:33:25.699+0100 7f2f4bfff700 -1 librbd::image::CreateRequest: 0x5608a8ba58e0 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported
I found a guide which helped me find the /etc/pve/storage.cfg
file that needed changed. I setup the pool in Proxmox normally then changed the file to have the data-pool
as the erasure coded pool and then the pool
as the replicated pool.
dir: local
path /var/lib/vz
content iso,vztmpl,backup
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
rbd: Hot
content images
data-pool ERASURE_SSD
krbd 0
pool REPLICATED_SSD
rbd: Cold
content images
data-pool ERASURE_HDD
krbd 0
pool REPLICATED_HDD
I guess there is still the Kubernetes storage
It's nice to see the Kubernetes cluster back up and running
bash-4.4$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
bri-kubeworker-1 Ready <none> 2m57s v1.26.4 192.168.1.21 <none> Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.21.7
bri-kubeworker-2 Ready <none> 2m57s v1.26.4 192.168.1.22 <none> Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.21.7
bri-kubeworker-3 Ready <none> 2m57s v1.26.4 192.168.1.23 <none> Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.21.7
bri-master-1 Ready control-plane 10m v1.26.4 192.168.1.20 <none> Rocky Linux 8.7 (Green Obsidian) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.21.7
This guy has done a good walk through of it: https://itnext.io/provision-volumes-from-external-ceph-storage-on-kubernetes-and-nomad-using-ceph-csi-7ad9b15e9809
I'm creating a test deployment so that I know that the ceph volumes work. I can't seem to get logs from the pods. Might be a firewall issue with the hosts.
Error from server: Get "https://192.168.1.23:10250/containerLogs/default/radarr-847987fd9-lh49p/radarr": dial tcp 192.168.1.23:10250: connect: no route to host
I fixed the firewall rules in #88 which fixed the issue.
I need to be able to iterate quickly. Creating a way to initialise the cluster and tear down everything would be useful.
Initialise Cluster
The above document didn't work as easily as I thought. I've done a rebuild and I'm going to try with the examples in the ceph-csi repository
Yay!
[kubernetes@bri-master-1 ~]$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
factorio-claim Bound pvc-e6afecbb-ad5c-4df6-8310-0508bedbe2f7 15Gi RWO csi-rbd-sc 10m
nginx-pvc Bound pvc-22860303-8326-4c4e-a42d-4ecb747a3169 1Gi RWO csi-rbd-sc 30m
You're looking at a RBD block device being mounted in a pod from my ceph cluster. Now to tidy things up and then add velero support
Ansible code has now been written up and I can deploy a k8s cluster with the storage classes defined
When I use Persistent Volumes on my local Kubernetes cluster