Closed JohnStrunk closed 3 years ago
with this are we expecting the external storage to be a storageclass within a cluster or completely external? @JohnStrunk @screeley44
I think external, to show how Scribe can help get your data into a kube environment
Completely external.
Consider: The IT department has a project to move application X from their legacy infrastructure into their shiny new Kubernetes environment.
We should be able to run periodic syncs (for staging/testing) prior to the final switchover. (I'm thinking of a cron entry on the external infra to drive this)
Potential spec I am thinking @backube/scribemaintainers
---
apiVersion: scribe.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
trigger:
schedule: "*/3 * * * *"
external:
address: my.host.com
sshKey: scribe-rsync-dest-src-database-destination
storageSecret: secret (optional tls values)
sourceType: gluster
path: /brick1
storageAddress: xxx.xxx.xxx.xxx
---
apiVersion: scribe.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
sourcePVC: mysql-pv-claim
trigger:
schedule: "*/3 * * * *"
external:
address: my.host.com
sshKey: scribe-rsync-dest-src-database-destination
storageSecret: secret (ceph.conf + keyring)
sourceType: cephrbd or cephfs
storageAddress: xxx.xxx.xxx.xxx
path: /cephrbd
# Stretch Goal
---
apiVersion: scribe.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
trigger:
schedule: "*/3 * * * *"
external:
sshKeys: scribe-rsync-dest-src-database-destination
storageAddress: xxx.xxx.xxx.xxx
sourceType: SSH
path: /var/www/html
address: my.host.com
For the destination, we most likely could get away with almost all of the same parameters that rsync operates with today.
I initially thought about having the source perform all of the work but I worry it gets away from our current source and destination models and would potentially require some code reworks.
I was hoping this wouldn't require any changes to the CR or operator... We could have a script/binary that runs on the external infrastructure and plays the role of the Source or Destination. Since it's just rsync over an ssh connection, those are commonly available across most platforms. Imagine a script like:
./scribe-source --source /my/local/data --destination elb.cluster.com:22 --local-key my-ssh-key --remote-key other-key.pub
... that would connect to the Service created by a ReplicationDestination.
This script could be triggered via at
or cron
to create periodic syncs.
The local/remote keys would need to agree with the secretRef in the corresponding ReplicationDestination (either autogenerated or manually created would both work).
Moving data out may be even easier given the ReplicationSource for rsync. Assuming the external system has an ssh server:
apiVersion: scribe.backube/v1alpha1
kind: ReplicationSource
metadata:
name: replicationsource-sample
spec:
sourcePVC: pvcname
trigger:
schedule: "0 * * * *" # hourly
rsync:
copyMethod: Clone
sshKeys: secretRef
address: my.external-system.com # the external host that mounts the storage
port: 22 # port that runs sshd
sshUser: myusername # username to use when connecting to the remote system
The above is generic, so we don't need to care what storage we're migrating from/to.
The binary would be interesting
@JohnStrunk with the YAML above though how would we handle landing the data and retaining it with the current way items are cleaned up after the source run.
@JohnStrunk with the YAML above though how would we handle landing the data and retaining it with the current way items are cleaned up after the source run.
I'm not sure I understand... The Source would work just like it always does... clone => rsync => delete. The next iteration would do the same. Rsync has no problem diff-ing even though it's a different source PV.
If you're referring to the lack of snapshot ability on external systems, I think there's room for some improvement there:
With the storage being external a sourcePVC wouldn't exist which would stop us from clone or snapshot at the ReplicationSource side.
I do like the strategy of the binary but I don't know how we can use the replicationsource or if we actually need it at all since the replicationdestination would give us our ELB and create the pvc or Snapshot.
Describe the feature you'd like to have. We should put together a demo showing how to use Scribe to sync data into a kube cluster.
What is the value to the end user? (why is it a priority?)
How will we know we have a good solution? (acceptance criteria)
Additional context