ArmyCyberInstitute / cmgr

CTF Challenge Manager
Apache License 2.0
17 stars 9 forks source link

Proposal: unlink network names from instance IDs #49

Closed dmartin closed 1 year ago

dmartin commented 1 year ago

When an instance of a build is started, there are two types of Docker resources created:

Currently cmgr explicitly tracks the containers associated with an instance in a containers table, which looks like:

instance id
instance id (fk) Docker container id

however, the network is only implicitly associated with the instance, by naming it cmgr-{instance_id}.

This is usually fine, but can lead to issues when cmgr's state becomes out of sync with Docker's.

For example, say a user "forcibly resets" their cmgr state by deleting cmgr.db. Afterwards, attempts to start new instances will fail (as the generated network names will conflict with ones left over from before), unless the user manually runs docker network prune / rm.

However, even that may not work, as Docker will sometimes fail to delete a network even when it does not actually have any remaining containers attached (this can also prevent stopping an instance in typical usage).

I have two proposed changes to make cmgr more resilient to mismatches with Docker engine state and networking bugs:

  1. Allow Docker to generate network IDs, and explicitly store them in a networks table:

    instance id
    instance id (fk) Docker network id

    which would then be used when calling Manager.stopNetwork.

  2. Downgrade network removal errors - primarily as a workaround for the libnetwork bug linked previously, upon any network removal error, print a warning and proceed to deleting instance metadata, rather than early-returning an error value from Manager.stopInstance. I think this change would be justifiable, as failure to remove a network would no longer lead to name conflicts.

As a note, a similar unlinking-of-IDs might be useful to avoid state-mismatch issues at the build level as well (using database-stored hashes for image tags and artifact directory names). In the worst case, this might lead to orphaned images and artifact files on disk, but would prevent the possibility of serving outdated content from prior cmgr installations.

dmartin commented 1 year ago

I'd like to rethink this a bit, based on recent experiences with Docker network behavior. Will possibly re-propose something later on in PR form.