Closed jspawar closed 5 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/165891137
The labels on this github issue will be updated when the story is started.
@jspawar We think that you copypasted the wrong log in What actually happened and it should be:
2019-05-08T10:09:29.00-0700 [APP/PROC/WEB/0] OUT 2019-05-08T17:09:29.203256377+00:00 line 1 2019-05-08T10:09:29.00-0700 [APP/PROC/WEB/0] OUT 2019-05-08T17:09:29.203203397+00:00 line 0
We can't reproduce it in our environment. Please, check if this is related to your setup.
Hey @alex-slynko,
Thanks for looking into this. I think there are two symptoms we've seen related to the logging not being precise enough:
When you reproduce this, do you see sub-second precision in your log lines? I assume the no-repro was because you didn't the see the mis-ordering of logs? The log lines from the dora app have two timestamps in them: one from the app itself and one from loggregator. When we were seeing the mis-ordering, only the loggregator-generated timestamp was the problematic one. (If any of this is unclear, happy to chat on slack or jump into a zoom call.)
It's also possible this is due to something in our configuration.
Hey @jenspinney & @jspawar,
We were able to reproduce the subsecond precision problem you see. We found out that it was caused by fluentd. We were using v0.12
which didn't return subsecond precision. After bumping fluentd to v1.4.2
we can now see subsecond precision in the logs.
We were not actually able to reproduce the ordering problem you see -- for us it always shows the logs in the right order.
We are currently waiting for pipeline to pass before it gets on master
, then we'll let you know so you can verify if it works on your environment.
Best regards, Georgi && @akshaymankar
Hi @gdankov and @akshaymankar,
Thanks for looking into this! It looks like the commit to bump fluentd is now on master, and there's a new "latest" tag of eirini/loggregator-fluentd. I was wondering if there will be a specific tag like 0.4.0 that would include these changes. If we reference the "latest" tag on the docker image, it could bring some surprises for us down the line, and we'd rather try it out on a tag we can pin to.
In regards to not seeing the misordering: this is just a random behavior that you may or may not see. We talked to the loggregator team when digging into this, and they said that if the logs the loggregator system receives don't have the subsecond precision, loggregator itself won't know how to order them. They may be ordered correctly sometimes, but it's not deterministic.
Thanks again, Jen
Hi @jenspinney,
We recently decided to switch from using specific tags like x.y.z
to using image digests because of their immutability and predictability. Currently, we use a digest only for our loggregator-fluentd
image, but the plan is to eventually use it for all of our images.
You can find the loggregator-fluentd digest here.
Would this solution work for you?
Thanks, Georgi
thank you again for reporting it @jenspinney & @jspawar The change is now in master branch.
FYI, we just had a chance to validate this, and it fixes the bug for us!
And yes, @gdankov, the image digest approach is great. I just didn't realize you had moved to that. Thanks!
Description
We recently noticed that after enabling app logs via: https://github.com/cloudfoundry-incubator/eirini-release/blob/master/helm/eirini/templates/daemonset-fluentd.yaml that we do successfully receive all logs from our apps; however, they are missing precision.
When tailing/following logs this is not an issue in terms of log message order, but when looking back at recent logs via
cf logs --recent
we see that order is no longer preserved which we suspect to be the result of this lack of precision for the log messages.Steps to reproduce
cf logs my-app --recent
What was expected to happen
NOTE: The second set of timestamps on each of the above lines (which have correct precision) are logged by the app itself from its STDOUT
What actually happened
NOTE: The second set of timestamps on each of the above lines (which have correct precision) are logged by the app itself from its STDOUT
Additional information (optional)
We are deploying CF using
cf-deployment
andeirini-bosh-release
with BOSH.We then manually apply the fluentd daemonset to our k8s cluster