caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.96k stars 4.02k forks source link

Log rotate issues #1363

Closed Tester98 closed 7 years ago

Tester98 commented 7 years ago

1. What version of Caddy are you running (caddy -version)?

Current release

2. What are you trying to do?

Access logs rotation

3. What is your entire Caddyfile?

log / access.log "{remote} {when} {method} {uri} {proto} {status} {size} {>Referer} {>User-Agent} {latency}" {
        rotate {
                size 100 # Rotate after 100 MB
                age  30  # Keep log files for 14 days
                keep 10  # Keep at most 10 log files
        }
}

4. How did you run Caddy (give the full command and describe the execution environment)?

5. What did you expect to see?

Log file created, trimmed at proper size and new log files created. however logfiles of greater size was created, then a small file was created. If this bug triggers in production, then randomly previous access files could get deleted as keep limit would touch

6. What did you see instead (give full error messages and/or log)?

wrong sized file, see ls -lh access* -rw-r--r-- 1 root wheel 100M Jan 18 06:56 access-2017-01-17T09-01-49.164.log -rw-r--r-- 1 root wheel 106M Jan 18 17:58 access-2017-01-18T17-39-41.106.log -rw-r--r-- 1 root wheel 344B Jan 18 17:40 access-2017-01-18T17-45-42.329.log -rw-r--r-- 1 root wheel 70K Jan 18 17:41 access.log

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

Firing lot of requests

mholt commented 7 years ago

That does look like a "bug" of sorts, although it would probably be in the lumberjack library: https://github.com/natefinch/lumberjack

I haven't had a chance to look into the code but my guess is that it's not properly synchronizing rotations if concurrent requests hit at just the right moment. /cc @natefinch - have you seen this before?

natefinch commented 7 years ago

Bugs in lumberjack!? unpossible! :)

But seriously... lumberjack has a lock around every call to write, so multiple goroutines isn't a problem.

The only way this could happen is if there are two processes writing to the same log file. Since keeping track of the log file size is done in-process (by just counting the number of bytes written to it), if you had two processes writing to the same log file, then they wouldn't know about each other's writes, and could go over the max.

Tester98 commented 7 years ago

A single copy of caddy was running on the node. NOTE: tail should not be affecting it AFAIK

mholt commented 7 years ago

@natefinch What about multiple instances of lumberjack.Logger? @Tester98 didn't post the entire Caddyfile (tsk, tsk) so I'm just guessing... would multiple instances with the same output file configuration act the same as running it from two processes?

Tester98 commented 7 years ago

a separate error log is there, but it writes to different file errors { log error.log { size 100 # Rotate after 100 MB age 30 # Keep rotated files for 30 days keep 10 # Keep at most 5 log files } }

natefinch commented 7 years ago

Yes, multiple lumberjack.loggers pointing at the same log file would have the same problem, since it's internal to the value

mholt commented 7 years ago

@Tester98 Do you only have those two logs in your whole Caddyfile? Do you have multiple site addresses for the server block? Can I see your entire Caddyfile please?

natefinch commented 7 years ago

two Loggers pointing at different logs should be no problem... there's no global state in the package.

Tester98 commented 7 years ago

Yes, only 2 log lines are there. Cannot give the entire config file as i am just the webserver admin (with 2 more web admins in different shifts) and is maintaining the thing, anything that i share goes through atleast 2 managers approval.Hope you can understand. btw, a tail process was running

natefinch commented 7 years ago

The only way lumberjack should go over its quota is if something else writes to the file. Tail would be fine, since it doesn't write.

Tester98 commented 7 years ago

the server config was also reloaded by sending usr1 signal, if that would help.

mholt commented 7 years ago

How many site addresses are defined for the Caddyfile, that uses the log directive?

Tester98 commented 7 years ago

4 site address in 1 block of the xyz.com, www .xyz.com, https: //xyz.com and https:// www .xyz.om

mholt commented 7 years ago

@Tester98 Bingo. Strip that down to just 1 address and hit it again (the https and http variants are redundant anyway if you're using auto HTTPS). (See, this is why the whole Caddyfile is necessary.)

anything that i share goes through atleast 2 managers approval.

Sorry about that. :-/ This is one reason why I'm considering a paid option for companies who want private support so you won't have to jump through these hoops.

mholt commented 7 years ago

By the way, the point is that if stripping the Caddyfile down to just one site address (just for your test -- hit it with a lot of requests in this timeframe) fixes the problem consistently, then I have a pretty good idea for a fix.

@natefinch Thanks for chiming in. :+1:

Tester98 commented 7 years ago

This is a production system, can't strip down.

mholt commented 7 years ago

@Tester98 You could test it locally? ;)

Tester98 commented 7 years ago

Shift change is underway, i had the time to talk to other admin and he has informed me that goaccess https://goaccess.io/ was also running during that point in time. So it may not be caddys fault. i have gone through goaccess source code at https://github.com/allinurl/goaccess/blob/e42ac72965b419dafe736d7a4c75226471209a47/src/error.c line 100 and it does seem to open the log file in append or write mode, so 2 writers are now confirmed. so this can be closed. I will also report it to goaccess so they can fix the write permissions, since goaccess is analysis tool and should never write to access log and thus never open file in write mode

mholt commented 7 years ago

Nice detective work. That could certainly be the problem. Thanks for looking into it!

allinurl commented 7 years ago

@Tester98 goaccess does not open the log in write mode nor append mode. It does it in read mode. Here's where it attempts to open it.

mholt commented 7 years ago

I have a hunch if it is a Caddy issue, but I need @Tester98 to try an experiment locally before I invest the time and complexity for a fix.

Tester98 commented 7 years ago

Since we are out of ideas, we can keep it closed but will reopen if it occurs again or someone else reports a similar issue.We will also test it internally ( by setting log file to rotate at 1 MB and firing queries ) and will report back if something fails.

mholt commented 7 years ago

Since we are out of ideas

What? I'm not out of ideas! Please try it locally (slam it with requests, you can use a tool like vegeta or wrk) and then remove all but one site address from the Caddyfile. Then do it again. If it doesn't happen, I know a fix. I know a fix. I know a fix.

Tester98 commented 7 years ago

We use sniper https://github.com/btfak/sniper for quick load testing, however the bug never occurred. However, discussion with the webmaster gave this information. He was on console (as server is located in-premises), and monitoring access.log. Suddenly access log stopped scrolling, admin found log file rotated. He tailed the new file. but found no logs scrolling for roughly a minute. surprised he saw another access.log created. I have dumped all this so the authors can get into the event information. Maybe the logs could not be added to newly created file instead appended to old file. i hope devs can figure out something.

mholt commented 7 years ago

@Tester98 I've pushed a possible fix in 04da9c7374ea07dc19f9a42cd2696ed5c45cfffb to the combinedrollers branch. Can you please build from that branch and see if it resolves the rotation issue?

mholt commented 7 years ago

@Tester98 Ping. :) Does the patch I submitted work for you?

Tester98 commented 7 years ago

We tested the patch. The bug has not repeated itself in the last 10 days. so the bug can be closed

mholt commented 7 years ago

Excellent, thank you! We'll merge it in for the next release.