clincha-org / clincha

Configuration and monitoring of clinch-home infrastructure
https://clinch-home.com
1 stars 1 forks source link

Protected storage #80

Closed clincha closed 1 year ago

clincha commented 1 year ago

When I use Persistent Volumes on my local Kubernetes cluster

clincha commented 1 year ago

Kasten looks like a good product that connects with BackBlaze. I can use it to set backup policies for my Persistent Volumes

clincha commented 1 year ago

Valero also looks really good. Not sure if it can do persistent volume backups though

clincha commented 1 year ago

You need to break a few eggs before you can make omelettes. I have saved the Factorio configuration and game saves. Let's delete everything on Proxmox and then get started on the Ceph cluster.

clincha commented 1 year ago

Installing the dashboard

  1. Installing the Ceph Dashboard. I ran this command on a node
apt install ceph-mgr-dashboard
  1. Then this to enable the dashboard:
ceph mgr module enable dashboard

Then I hit this error:

Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement

So I repeated step 1 on the other nodes as well. Then I did step 2 again.


Logging in

I made a certificate:

ceph dashboard create-self-signed-cert

I made a user:

vi password
ceph dashboard ac-user-create clincha -i password administrator

URL is available at port 8443

clincha commented 1 year ago

I've created 4 pools

ERASURE_HDD
ERASURE_SSD
REPLICATED_HDD
REPLICATED_SSD

image

I needed to do some magic to make it work correctly in Proxmox. First of all I tried using the ERASURE_SSD as new storage in Proxmox but it threw this error:

TASK ERROR: unable to create VM 100 - rbd create 'vm-100-disk-0' error: 2023-05-04T16:33:25.699+0100 7f2f4bfff700 -1 librbd::image::CreateRequest: 0x5608a8ba58e0 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported

I found a guide which helped me find the /etc/pve/storage.cfg file that needed changed. I setup the pool in Proxmox normally then changed the file to have the data-pool as the erasure coded pool and then the pool as the replicated pool.

dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

rbd: Hot
    content images
    data-pool ERASURE_SSD
    krbd 0
    pool REPLICATED_SSD

rbd: Cold
    content images
    data-pool ERASURE_HDD
    krbd 0
    pool REPLICATED_HDD
clincha commented 1 year ago

I guess there is still the Kubernetes storage

clincha commented 1 year ago

It's nice to see the Kubernetes cluster back up and running

bash-4.4$ kubectl get nodes -o wide
NAME               STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
bri-kubeworker-1   Ready    <none>          2m57s   v1.26.4   192.168.1.21   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-kubeworker-2   Ready    <none>          2m57s   v1.26.4   192.168.1.22   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-kubeworker-3   Ready    <none>          2m57s   v1.26.4   192.168.1.23   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-master-1       Ready    control-plane   10m     v1.26.4   192.168.1.20   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
clincha commented 1 year ago

This guy has done a good walk through of it: https://itnext.io/provision-volumes-from-external-ceph-storage-on-kubernetes-and-nomad-using-ceph-csi-7ad9b15e9809

clincha commented 1 year ago

I'm creating a test deployment so that I know that the ceph volumes work. I can't seem to get logs from the pods. Might be a firewall issue with the hosts.

Error from server: Get "https://192.168.1.23:10250/containerLogs/default/radarr-847987fd9-lh49p/radarr": dial tcp 192.168.1.23:10250: connect: no route to host

I fixed the firewall rules in #88 which fixed the issue.

clincha commented 1 year ago

I need to be able to iterate quickly. Creating a way to initialise the cluster and tear down everything would be useful.

Initialise Cluster

clincha commented 1 year ago

https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/

clincha commented 1 year ago

The above document didn't work as easily as I thought. I've done a rebuild and I'm going to try with the examples in the ceph-csi repository

clincha commented 1 year ago

Yay!

[kubernetes@bri-master-1 ~]$ k get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
factorio-claim   Bound    pvc-e6afecbb-ad5c-4df6-8310-0508bedbe2f7   15Gi       RWO            csi-rbd-sc     10m
nginx-pvc        Bound    pvc-22860303-8326-4c4e-a42d-4ecb747a3169   1Gi        RWO            csi-rbd-sc     30m

You're looking at a RBD block device being mounted in a pod from my ceph cluster. Now to tidy things up and then add velero support

clincha commented 1 year ago

Ansible code has now been written up and I can deploy a k8s cluster with the storage classes defined