caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.03k stars 3.99k forks source link

Feature request: Log rotation based on date #1096

Closed itskenny0 closed 8 years ago

itskenny0 commented 8 years ago

Simply put: I'd like to rotate my log files once every day, e.g. access.log-20160908, access.log-20160909 ... This does not seem to be possible as of now.

I could configure this using the normal logrotate cron of my system. Does Caddy handle the file being moved away correctly?

//EDIT: I tested this. The file being moved away isn't a problem to Caddy, but it keeps writing to the file even after it has been moved away :)

mholt commented 8 years ago

Hey Kenny, thanks for your question. Ultimately, our roller library would need to support this. However, after reading this, I also agree with them that I don't think this is a very good idea: https://github.com/natefinch/lumberjack/issues/17

Isolus commented 6 years ago

I think this feature is needed to handle some legal problems.

Due to privacy laws in some jurisdictions you are only allowed to store ip addresses for a very short time (e.g. 7 days) to resolve technical issues or investigate or mitigate an attack.

On a low traffic server it could take months before a log is rolled (and finally deleted) and the rolling is non-deterministic (not in space consumption but in time).

tobya commented 6 years ago

If of any use the ipmask subdirective can allow you to obfuscate ip addresses in your logs.

https://caddyserver.com/docs/log

Isolus commented 6 years ago

I'm currently using this directive to comply with the laws, but it makes troubleshooting a lot harder and tools like fail2ban useless.

aeris commented 11 months ago

Hello here!

I'm hitting a compliance trouble with Caddy too because of that. Since may 2018, European people have to comply with GDPR, which request to log full IP (no anonymization/obfuscation possible here) for legal purpose, but not more (and it must be garanteed) than a given number of days (usually 15 days, because Tele2 CJEU case). In France, our DPA issue https://www.legifrance.gouv.fr/jorf/id/JORFTEXT000044272396 (6 to 12 month retention) and our law is https://www.legifrance.gouv.fr/jorf/article_jo/JORFARTI000044228932 (1 year retention), but both overruled by Tele2 CJEU case (1 year too long, more acceptable maximum 15-30 days)

Current behavior of Caddy on log roll leads to no way to comply with law.

Split is done on size, and so even with logrotate outside Caddy, you can't be sure to have not even a single line of log more than 15 days. If a service was not very active, you can have 16+ days old content on the current log and so even logrotate can't purge it without loosing logs younger than those 15 days. Same trouble even if roll_keep_for 15d, it's the age of the log file before deletion and not the age of the last log line and so you can keep logs more than 15d in practice.

Changing roll_keep_for to lower value don't remove now too old content immediatly, but seems only doing it on next roll. Given the behavior on split on size on not on date, it's quite hard to have to uncompress/split/recompress to filter still existing log files on a given date for cleaning.

A small tolerance (some minutes) can exist for previous day log lines at current day log file start, like current behavior with logrotate/nginx/apache), but it must be clearly timeframed. Current caddy behavior can't be.

For GDPR compliance we MUST have one log file per day and caddy SHOULD delete any now too old log on roll_keep_for (it can be achieve with logrorate configuration if not).

We currently not able to emulate the right behavior with roll disabling on caddy + outside logrotate, because of #5316 which need a full caddy restart for that.

mholt commented 11 months ago

Thank you for explaining the situation in your region. I'll be happy to work on this right away with a sufficient sponsorship, even if it means patching the upstream lib or forking it or something like that. There's just a lot going on right now.

If you'd like to book a call to discuss a plan, feel free to do so at https://matt.chat and we'll get it taken care of 👍

aeris commented 11 months ago

I would be glad to contribute financially to such project but I'm not professional at all and just a private person who also have to comply with law 🤣 And I also face the trouble for my GDPR related association (because our national DPA is dysfunctional :roll_eyes: ). Typically for PURR, it seems we won't be able to deploy caddy at all because this bug and now have to refactor all my infrastructure to move back to nginx even for my personal website.

mholt commented 11 months ago

Are you sure your understanding of the law is correct (that it applies to your non-business use case, etc)? We have lots of personal (and business!) users in France with no problems as far as I know. Further, are you sure that journald doesn't do what you need? https://tecadmin.net/clear-systemd-journal-logs/

I would recommend seeking funding from your organization to meet the requirements you're after, rather than re-architecting your entire infrastructure to another stack. That will be more expensive and error-prone.

aeris commented 11 months ago

Yes, I'm sure of my understanding. GDPR contains no "non business" exception and there is currently 152 DPA decision against private individuals in Europe, Article 2(2)c would have no purpose if private individual are not at all entitle to GDPR (only purely personal or household activities are excluded) , and CJEU - C-25/17 - Jehovan todistajat rules that communication directed to unknown people or unknown number of people can't be seen as purely household activity, which is obviously the case for many if not all usage of Caddy (blog, forum, social network…). With my DPA, I personally have one winning case against non-business association and one winning against private individual.

Given the nature of this association and my personal "hobby" (one of the most active GDPR people on Europe with more than 120 open cases, EDPB expert member…), even the most tiny GDPR violation is not possible and even our current Caddy usage will be reported as GDPR violation (we already store data we are not supposed to). We must ensure perfection because we will challenge pretty anything else on this continent given the low GDPR compliant ecosystem (and even our own DPA, 3 winning case at this point 🤣).

Currently our association is 7 days old and so have currently no easy access to funding. Infra rework will be quite easy, not so many services deployed given the creation date, but yeah, only purely based on volunteer time. Mine will be longer, but nothing impossible.

(And yeah, I guess pretty any company in Europe have already very huge GDPR trouble, but Caddy usage will be one of the smallest :roll_eyes: 🤣)

mholt commented 11 months ago

And journald pruning won't suffice?

$ sudo journalctl --vacuum-time=14d
aeris commented 11 months ago

journalctl can solve the GDPR trouble for data retention but complicate data access for response to legal warrant (LCEN in France). It's not very easy to extract only data for a single vhost/url/ip or for date range (at request date, not logged date). And for legal purpose, I prefer simple log on FS than log on complex system like systemd, with no real/simple way to purge only HTTPd log after 15d but not all the system log (SSH/auditd log for example are not PII log so GDPR not apply and so MUST be store more than 15d).

mholt commented 11 months ago

What solution do you currently use that isn't compatible with Caddy? And what is its configuration like?

aeris commented 11 months ago

With nginx/apache, you can daily rename the log file from access.log to access.log.1 on the FS, which don't change the opened file handle so nginx/apache continue to log on it, then reload the process (SIGHUP/systemd reload), which create a new access.log and start logging a new day. You don't force close opened connection or loose incoming request because the process is only reloaded and not restarted.

For example for logrotate and nginx

/var/log/nginx/*.log {
        daily
        missingok
        rotate 14
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        postrotate
                invoke-rc.d nginx rotate >/dev/null 2>&1
        endscript
}

This behavior is not possible currently on Caddy, there is no way to force file handle reopening without a full restart (so force close opened connection and loose incoming one) because of #5316

francislavoie commented 11 months ago

Ultimately, our hands are tied until work is done on Lumberjack to allow these things to work, or until someone forks that library and adds the functionality themselves.

We considered forking it ourselves, but we don't realistically have the time to perform that refactor and maintain it on top of maintaining Caddy right now. So help is welcome.

Or sponsor the work so that it becomes worth the time investment.

aeris commented 11 months ago

I will take a look if I can patch something myself, definitively Caddy is a real nice to have for some features, but given my targeted usage, it's also clearly not usable. I expect to have thousands of Damocles swords above my head from companies I usually fine 2.5 millions euro for (not so) tiny GDPR violation 🤣