SumoLogic / sumologic-collector-docker

A Sumo Logic collector for Docker.
Apache License 2.0
69 stars 55 forks source link

sumologic-collector-docker creates duplicate collectors upon every restart #56

Open arunderwood opened 7 years ago

arunderwood commented 7 years ago

Currently, every time I docker rm sumologic and docker run sumologic/collector:latest-no-source, the collector resends all my LocalFile sources up to Sumologic, creating a bunch of duplicate logs.

Is there a place in the container that tracks the state of what messages have been synced up that I could mount in a volume to persist sync state between container instances?

maimaisie commented 7 years ago

Hi. The collector has internal state to keep track of what has been collected and the state is persistent for container restarts but not redeploys, so we recommend not redeploying the collector container unless you need a newer collector version (we don’t release new versions very frequently though.)

Another option is to install and run the collector as a service on your docker host. Unless you uninstall the service, the state is persistent through shutdown, restart and upgrade.

arunderwood commented 7 years ago

My goal is to make collector state persist through redeploys. Is there a specific directory in the container that holds the state so I can put it on a docker volume?

maimaisie commented 7 years ago

The collector directory is /opt/SumoCollector/ and contains collector/source configuration and states. However it is not officially verified that collector can persist through redeploys by putting the directory on a volume.

nhoughto commented 6 years ago

@arunderwood Did you attempt / succeed with this? We are seeing the same behaviour, its very annoying and probably should be mentioned in the README

It is especially a problem when near Sumo usage limits when the sumo service itself causes the collector to shutdown, which then triggers a restart, which then re-ships all the local files causing more usage... around and around..

arunderwood commented 6 years ago

No, sorry, I never dug into this. I just stopped using the docker collector.

ragnarkurm commented 4 years ago

SUCCESS, got it working. (with a couple of downsides).

How it works?

Downsides

The Dockerfile:

...
COPY my-entrypoint.sh /
ENTRYPOINT ["/my-entrypoint.sh"]

The my-entrypoint.sh:

#!/bin/bash

# This is a wrapper script around the collector run script.
# 1. It persists state.
# 2. Prevents parallelism by locking.

set -xeuo pipefail
export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

# Config
sumostate_shared=/storage/logs/sumologic.state
sumostate_conf=/opt/SumoCollector/config
lock=/storage/logs/sumologic.lock

# Initialize the state on persistent storage.
# This is performed only on first execution of the script.
if [[ ! -d "$sumostate_shared" ]]; then
  mv -v "$sumostate_conf" "$sumostate_shared"
fi

# Move away container state.
if [[ -d "$sumostate_conf" ]]; then
  mv -v "$sumostate_conf" "$sumostate_conf.orig"
fi

# Make the persisted state available to the current container.
ln -svf "$sumostate_shared" "$sumostate_conf"

# Make sure we don't run the collector in parallel.
# The collector is not designed to handle a shared state.
# This locking works nicely over NFS or Docker volume.
flock --verbose "$lock" /run.sh