[Discussion] Swarm rebuild and best way to retain data

Hi,

Over last 6 months I have encountered more than a few situations where the current most stable solution seems to be "rebuild swarm"

Loss of SSH after scale set deallocation/reallocation
Loss of sudo after scale set reboot
Unable to upgrade to latest STABLE version 17.12.0 (with plenty of network fixes) as upgrade container delayed till 17.12.1
Loss of swarm when restarting dockerd (due to ghost containers)
Switching from EDGE to STABLE channel
Changing VM tshirt size

I dont mind rebuilding the swarm, as it means I can review/refactor/clean-up my configuration

Mostly all of my configuration is scripted/documented, so its not too much effort.

Create a new swarm from the template
Assign some node metadata
Create networks/volumes/secrets
Deploy the stacks
Update DNS to the new swarm public IP

The sticking point with a rebuild, would be relinking the data from first swarm to the second swarm. I could not see guidance on how best to configure Azure to handle a swarm rebuild (rather than a swarm upgrade)

My naive setup of the swarm was:

Swarm v17.09

node1
nodeN
(default)cloudstor:azure -> docker4x/cloudstor -> azure storage account -> RANDOMSTRING123

Which used the defaults provided by the template, where the resources all live in the same resource group

When I rebuild, I would need to preserve the data contained within the storage account RANDOMSTRING123

Azure Storage Explorer

My first thought would be to create the new swarm and copy the data using Azure Storage Explorer Transfers should be free within the same region Storage requirements would be doubled for a short time This may only work while data size is small.

Override cloudstor:azure

My second thought would be to create the new swarm and override the default cloudstor:azure plugin with my own. Using https://docs.docker.com/docker-for-azure/persistent-data-volumes/#use-a-different-storage-endpoint (I have used a separate cloudstor:azure instance/storage for backups and that seems to work okay for short lived commands) Not sure if overriding the default plugin instance is possible/stable/recommended. There are a few issues on the forums where users are unable to re/create the plugin (error message is similar to "offer expired") I am hesitant about overriding anything "default" eg What happens if the default plugin instance needs to be changed/reset/locked-down as part of a future upgrade.

Separate cloudstor:azure

My third thought would be to store the swarm data in a separate 'named/aliased' cloudstor:azure instance Either in the same or possibly a completely separate resource group A separate resource group feels better from an isolation perspective, as that would allow me to completely purge the swarm resource group without data loss, no matter what future deployment restrictions are made on the docker swarm template/resource group. As long as the custom cloudstor:azure plugin instance could always reach into another storage account. Considering how quickly the platform changes, this third option seems the best.

I would then configure the swarm like this:

Swarm v17.12

node1
nodeN
(default)cloudstor:azure -> docker4x/cloudstor -> azure storage account -> RANDOMSTRINGABC
cloudstor:azuresafe -> docker4x/cloudstor -> azure storage account -> SITENAMEDOCKERDATA
cloudstor:azurebackup -> docker4x/cloudstor -> azure storage account -> SITENAMEDOCKERBACKUP

However, I would not be allocating any volumes on the (default)cloudstor:azure plugin instance

Upgrading all my stack templates to use the new plugin instance should be straight forward.

volumes:
  volsomename:
    name: 'somename'
    driver: cloudstor:safeazure

Some questions would be:

Are there any online resources recommending the best approach?
Are there downsides to the third approach?
Would this make upgrading harder in the future?
With the upcoming changes for virtual machine scale sets and attached storage, would a seperate resource group be better or worse?
Would there be a extra performance hit for using a storage account in a seperate resource group?

docker-archive / for-azure