allegroai / clearml-server-helm

ClearML Server for Kubernetes Clusters Using Helm
https://allegroai.github.io/clearml-server-helm/
Other
17 stars 15 forks source link

Question regarding data and backups #2

Closed Shaked closed 4 years ago

Shaked commented 4 years ago

Hello,

I have installed trains-server using helm on my k8s cluster (AKS).

I have been wondering, how can I backup the data from trains-server?

Is it possible to set some sort of a cron job that will export the data and save it somewhere on Azure Files?

Also, I have noticed that upgrading might require a --purge. In case I do that, I assume that all my data will be gone, isn't it?

Readme.md

Important:

If you previously deployed a trains-server, you may encounter errors. If so, you must first delete old deployment using the following command:

  helm delete --purge trains-server
After running the helm delete command, you can run the helm install command.

What would be the best practice for backups and restoring data?

Thank you! Shaked

bmartinn commented 4 years ago

Hi @Shaked

trains-server setup (either with docker or on k8s) is configured by default to store all data into externally mapped folder, usually /opt/trains.

This means that even deleting a deployed trains-server will not have an effect on the data itself, as it is stored outside of the containers.

If you take a look at the trains-server upgrade process, section 2 explains how to backup an entire server.

You could create a cron job doing just that, but I would opt for per database (mongodb, elastic-search) backup script, together with zipping /opt/trains/config and /opt/trains/data/fileserver

This way you do not have to spin down the trains-server but have the cron job executed independently.

Shaked commented 4 years ago

Hey @bmartinn

So the only way to lose data would be if the specific k8s node fails for whatever reason, right?

I hope you don't mind me asking, what was the reason to save the data on a specific node?

You could create a cron job doing just that, but I would opt for per database (mongodb, elastic-search) backup script, together with zipping /opt/trains/config and /opt/trains/data/fileserver

Do you know if I have to manually connect to the labeled node and set a cron job there or is there another way to do that? My fear is that it would become some unknown part of the entire setup process. Kind of the same as the elastic-search setup which although needs to be done only once, it's quite hard to automate.

Thank you for your help and patience! Shaked

bmartinn commented 4 years ago

So the only way to lose data would be if the specific k8s node fails for whatever reason, right?

Yes, only if the k8s node data volume is lost (which by default is on the node itself)

I hope you don't mind me asking, what was the reason to save the data on a specific node?

If you mean from a setup point of view, the idea was to make it as easy as possible to setup on k8s. Scaling elastic-search is an art of it's own, and well, trains-server is just another elastic-search setup, data volume is just one aspect out of many, and the same goes for the mongodb setup, the idea is to set it up as easily as possible.

Your point on the trains-server "elastic-search setup" is exactly that, our setup is nothing special but some of the ingredients (see ELK cookbook) you have to configure to get a stable elastic-search up and running...

Do you know if I have to manually connect to the labeled node and set a cron job there or is there ...

Hmm, I think I would maybe map the same data volume to an additional container and have that container run the cron job and back everything to an external object-storage. Notice that this is a regular ELK/MongoDB backup k8s setup, there is nothing special here, the only addition is /opt/trains/data/fileserver, which is just a regular file backup.

Shaked commented 4 years ago

I see.

In AKS there's a way to deploy Elastic Search using a Persistent Volume, which suppose to be connected to Azure Files. I wonder how hard it would be to adjust trains-server for that, because it would mean that Azure (or other cloud developers) would be able to automate the entire setup process (including upgrades)

Hmm, I think I would maybe map the same data volume to an additional container and have that container run the cron job and back everything to an external object-storage. Notice that this is a regular ELK/MongoDB backup k8s setup, there is nothing special here, the only addition is /opt/trains/data/fileserver, which is just a regular file backup.

That's a great idea actually. Definitely gonna go with this approach

bmartinn commented 4 years ago

In AKS there's a way to deploy Elastic Search using a Persistent Volume, which suppose to be connected to Azure Files. I wonder how hard it would be to adjust trains-server for that, because it would mean that Azure (or other cloud developers) would be able to automate the entire setup process (including upgrades)

Should not be complicated to integrate, the elastic search container is a standard ELK container, so off-the-shelf setup should work. Obviously this is cloud specific, hence not part of the default setup :)