Protected storage - Githubissues

clincha commented 1 year ago

When I use Persistent Volumes on my local Kubernetes cluster

I want to be able to access my data all the time so that my applications can run 24/7
I want a well tested backup strategy so that I don't lose data
I don't want applications to lose configuration settings and data if the compute/data resource goes down so that I don't need to reconfigure applications
I want to be able to store lots of data so that I can backup my family photos and movies
I want to store and retrieve configuration settings quickly so that my application runs quickly

clincha commented 1 year ago

Kasten looks like a good product that connects with BackBlaze. I can use it to set backup policies for my Persistent Volumes

clincha commented 1 year ago

Valero also looks really good. Not sure if it can do persistent volume backups though

clincha commented 1 year ago

You need to break a few eggs before you can make omelettes. I have saved the Factorio configuration and game saves. Let's delete everything on Proxmox and then get started on the Ceph cluster.

clincha commented 1 year ago

Installing the dashboard

Installing the Ceph Dashboard. I ran this command on a node

apt install ceph-mgr-dashboard

Then this to enable the dashboard:

ceph mgr module enable dashboard

Then I hit this error:

Error ENOENT: all mgr daemons do not support module 'dashboard', pass --force to force enablement

So I repeated step 1 on the other nodes as well. Then I did step 2 again.

Logging in

I made a certificate:

ceph dashboard create-self-signed-cert

I made a user:

vi password
ceph dashboard ac-user-create clincha -i password administrator

URL is available at port 8443

clincha commented 1 year ago

I've created 4 pools

ERASURE_HDD
ERASURE_SSD
REPLICATED_HDD
REPLICATED_SSD

I needed to do some magic to make it work correctly in Proxmox. First of all I tried using the ERASURE_SSD as new storage in Proxmox but it threw this error:

TASK ERROR: unable to create VM 100 - rbd create 'vm-100-disk-0' error: 2023-05-04T16:33:25.699+0100 7f2f4bfff700 -1 librbd::image::CreateRequest: 0x5608a8ba58e0 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported

I found a guide which helped me find the /etc/pve/storage.cfg file that needed changed. I setup the pool in Proxmox normally then changed the file to have the data-pool as the erasure coded pool and then the pool as the replicated pool.

dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

rbd: Hot
    content images
    data-pool ERASURE_SSD
    krbd 0
    pool REPLICATED_SSD

rbd: Cold
    content images
    data-pool ERASURE_HDD
    krbd 0
    pool REPLICATED_HDD

clincha commented 1 year ago

I guess there is still the Kubernetes storage

clincha commented 1 year ago

It's nice to see the Kubernetes cluster back up and running

bash-4.4$ kubectl get nodes -o wide
NAME               STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
bri-kubeworker-1   Ready    <none>          2m57s   v1.26.4   192.168.1.21   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-kubeworker-2   Ready    <none>          2m57s   v1.26.4   192.168.1.22   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-kubeworker-3   Ready    <none>          2m57s   v1.26.4   192.168.1.23   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7
bri-master-1       Ready    control-plane   10m     v1.26.4   192.168.1.20   <none>        Rocky Linux 8.7 (Green Obsidian)   4.18.0-425.19.2.el8_7.x86_64   cri-o://1.21.7

clincha commented 1 year ago

This guy has done a good walk through of it: https://itnext.io/provision-volumes-from-external-ceph-storage-on-kubernetes-and-nomad-using-ceph-csi-7ad9b15e9809

clincha commented 1 year ago

I'm creating a test deployment so that I know that the ceph volumes work. I can't seem to get logs from the pods. Might be a firewall issue with the hosts.

Error from server: Get "https://192.168.1.23:10250/containerLogs/default/radarr-847987fd9-lh49p/radarr": dial tcp 192.168.1.23:10250: connect: no route to host

I fixed the firewall rules in #88 which fixed the issue.

clincha commented 1 year ago

I need to be able to iterate quickly. Creating a way to initialise the cluster and tear down everything would be useful.

Initialise Cluster

Download ISO (optional)
Create Packer Template
Deploy VM
Connect to GitHub and add as runner

clincha commented 1 year ago

https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/

clincha commented 1 year ago

The above document didn't work as easily as I thought. I've done a rebuild and I'm going to try with the examples in the ceph-csi repository

clincha commented 1 year ago

Yay!

[kubernetes@bri-master-1 ~]$ k get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
factorio-claim   Bound    pvc-e6afecbb-ad5c-4df6-8310-0508bedbe2f7   15Gi       RWO            csi-rbd-sc     10m
nginx-pvc        Bound    pvc-22860303-8326-4c4e-a42d-4ecb747a3169   1Gi        RWO            csi-rbd-sc     30m

You're looking at a RBD block device being mounted in a pod from my ceph cluster. Now to tidy things up and then add velero support

clincha commented 1 year ago

Ansible code has now been written up and I can deploy a k8s cluster with the storage classes defined

clincha-org / clincha

Protected storage #80

Installing the dashboard

Logging in