SumoLogic / sumologic-collector-docker

A Sumo Logic collector for Docker.
Apache License 2.0
69 stars 55 forks source link

Sumo Collector spams NoSuchContainer to logs #96

Open cptera opened 2 years ago

cptera commented 2 years ago

We deploy the sumologic/collector:latest image in our customers production environemt to collect logs for our product (a set of containers running) and have it send logs to our sumologic account. Over the last few months we've noticed a lot of spam logs due to the sumologic collector throwing error like

jvm 1 | Exception in thread "onError:DockerLogInput:000000005BF8C7E3:'Docker-logs':cee113241350c24c5fbc416b8e953dd441c3aa957fb058d0597a554fea98c2db:connector_healthcheck.1.r6cze5gx0o50en9a713u7h860:1040812" java.lang.RuntimeException: com.github.dockerjava.api.exception.NotFoundException: No such container: cee113241350c24c5fbc416b8e953dd441c3aa957fb058d0597a554fea98c2db

From our investigation this is mostly happening for collector 19.351-4, but it's also happening for 19.361-4 and if I had to guess this may be due to switching from the Forked docker-java dependency to the open source one as listed in the release notes for https://github.com/SumoLogic/sumologic-collector-docker/releases/tag/v19.351-4

Our main issue is that this causes a lot of our sumo quota to get used up, log collection seems to be functioning fine.

maimaisie commented 2 years ago

@yuting-liu can you take a look as this is suspected to be related to the forked docker-java removal? Thanks

yuting-liu commented 2 years ago

@cptera the error itself seems to be related to that the container didn't get restarted appropriately. Did you happen to see errors from the container log? It might include more details about the error.

cptera commented 2 years ago

Hmm, you're right, it looks like the container connector_healthcheck.1.r6cze5gx0o50en9a713u7h860 restarted, but the logs are still being collected after it restarted so I don't think it's an issue with docker or the container itself failing to restart.

cptera commented 2 years ago

Like we deploy everything in a docker swarm and we designed our containers to restart on certain errors and in this case it restarted on an error it should have. We've been running it this way for several years and we don't make major changes very often.

cptera commented 2 years ago

With the log4j bug our fix of "just using an older version" is no longer viable