Open joubertredrat opened 8 years ago
i like this idea, being able to track down access and/or modification outside of what git provides . nice proposal :)
I've been meaning to implement this for a while, but I don't know when I'll have time to do it since it'll be a big change :unamused:
So that your boss could know how many work you have done. :(
No people are working on this.
I really would to implement this, but I think that I can't, I'm still not good with golang
Step by step, start from some simple PR! You can.
I am willing to work on this.
@lunny I have a few questions:
Please see github's auditlog and I think maybe database and bleve is the two choices.
In this case is a table as below:
id
= primary key.user_id
= foreign key for user.action
= actions by user (user.*, repo.*, admin.*, org.*, etc)
.description
= action complementar text (%user created %repo)
.As clearly stated on Gitter, I also prefer the database to store the audit log.
Is it at all feasible to have the audit logs be sent to file, syslog or db, or combination of them?
@go-gitea/maintainers vote the Hooray reaction for file logging, and the Heart reaction for the database logging. Also, please look at https://github.com/go-gitea/gitea/issues/8#issuecomment-286463807
Hooray | Heart |
---|---|
File | Database |
Logging | Logging |
@ptman I think you want to know where you put the logs because you want to read them back. WHere "you" is Gitea codebase (to show the logs in the UI)
@geek1011 we have a "proposals" repository which hasn't been used much so far but could be good to start using. You'd be writing a formal proposal about what you plan to implement and maintainers would vote on it. If accepted, it'll be an official plan. At least that's how I think it was envisioned to work.
I think one of the most important requirements with logs (audit or otherwise) is possibility of sending logs somewhere where they are append-only, safe from modification. Database seldom meets this requirement.
For the record: talk is going on in the #gitea IRC channel on Freenode network
<geek1011> I am working on the audit log feature. <geek1011> Should the log be in a file or the database? <geek1011> In the issue, database has been suggested, but I am worried about it filling the database up <geek1011> @lunny <geek1011> @joubertredrat <joubertredrat> Hi <gyula21> yes my RPI needs ssh password when I want to connect <joubertredrat> @geek1011 what doubt? <joubertredrat> [edit] @geek1011 what your doubt? <joubertredrat> you talk about how many rows the table can have? <geek1011> Yes <geek1011> Also, the disk space <geek1011> And I am thinking it may be easier to manage in a file. <joubertredrat> I think that with high usage, this table can have 300, 500Mb size <geek1011> Yes, but the user might not want a database that big <joubertredrat> @geek1011 I dont know in Golang, in PHP is hard to make pagination on file <geek1011> There is a built in logging framework I think <joubertredrat> we can consider as optional set rows limit on table, what you think? <gitter> [Github] geek1011 commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286438582 <joubertredrat> if not set, will put always <joubertredrat> if set 10k rows, a crontab remove oldest rows <geek1011> It may be better in a file with log rotation <geek1011> It is also easier to manually parse that way <geek1011> It would also be easier to manually inspect that way <joubertredrat> @geek1011 but and for query? on my dashboard as example I need to see only mine activities <geek1011> According tog the github logging: All audited system events—including all pushes and pulls—are logged to /var/log/github/audit.log. Logs are automatically rotated every 24 hours and are retained for 7 days. <geek1011> The files can be filtered <geek1011> Also, the audit log should be kept standalone <geek1011> Not mixed with other features <joubertredrat> for me I prefer on a database, but if is easy to implement this on a file in golang <joubertredrat> then is good idea to use a file for this <geek1011> It is easy I think <geek1011> So, unless I cannot do a file, I will do the database <joubertredrat> Okay, lets to see what @lunny will say about this :D <joubertredrat> If I can help Im here :+1: <gitter> [Github] geek1011 commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286440065 <tboerger> please write the audit log to the database. <geek1011> @tborger how do I manage the size? <geek1011> It may get too big <geek1011> Also on a raspberry pi for example, it could wear the sd card out <joubertredrat> isnt better to talk about this on develop channel? <geek1011> OK, sorry, my mistake
<geek1011> @tboerger @joubertredrat <geek1011> So for the logging <geek1011> Also github logs it to a file <joubertredrat> Error 404, @joubertredrat not found <joubertredrat> HAHA <geek1011> IRC does not give 404s <joubertredrat> @tboerger why you think that is better on database? <lunny> I prefer store it in database. <geek1011> @lunny Github does not <lunny> Are you sure? <geek1011> Yes <lunny> Where github stored? <geek1011> In a file <geek1011> According tog the github logging: All audited system events—including all pushes and pulls—are logged to /var/log/github/audit.log. Logs are automatically rotated every 24 hours and are retained for 7 days. <tboerger> i want to have it in the db. pagination, search and so on. <geek1011> I think the audit logs should be kept seperate <tboerger> storage wise it doesn’t matter if it’s a file or in the db. <geek1011> Yes, because a file can be stored on tmpfs or on sshfs <geek1011> but the database is one big file <geek1011> Audit logs are for security reasons, so they should not be used for the ui <tboerger> and also if i think about being scaleable at some point i don’t want to manage even more files via some storage system instead of rlying on a clustered database. <geek1011> Also, the format is simpler if it does not need to be readable by gitea <tboerger> the audit log should be displayed on the ui for admins. <geek1011> As in, what happens if we have a new event which needs more fields <geek1011> Yes, then it can show the file, but nicely formatted <lunny> Why the audit log is so big? <tboerger> i have done audit logs for various systems and i have always stored it in a database. <geek1011> @lunny it will store everything <lunny> Are there any example repository? <tboerger> millions of db records are better manageable than files with millions of lines. <geek1011> https://help.github.com/enterprise/2.9/admin/articles/audit-logging/#system-events <geek1011> @tboerger files will be rotated according to the logging settings <lunny> I agree and database also supports rotate <tboerger> > retained for 7 days <tboerger> so these records are not so much if we also prune them at some point <joubertredrat> @tboerger @lunny rotate as optional <tboerger> i don’t know what the other maintainers say, but now there are 2 people who say let’s store it in the database. <joubertredrat> I'm as example want to disable this, I want to have audit forever <geek1011> And 2 for files <strk> we should really define how decisions are taken <geek1011> Yes <lunny> interesting <strk> ie: a proper voting mechanism - I'm used to mailing lists where a thread is started, a week is given and +1 / -1 are counted <joubertredrat> @lunny @tboerger I think that is good idea to ser limit by date or by lines <strk> not having a mailing list requires having a different way to announce a vote <strk> only thing that comes to my mind at the moment would be an issue <lunny> Why we need audit log? how should we use the audit log. <strk> and votes could be +1/-1 as reaction <lunny> I have to ask the two questions. <geek1011> For viewing things like who did what when <strk> but reactions are not limited to maintainers, so can't be done that way <geek1011> and things such as who force pushed over the repo <lunny> how to view, line by line? <geek1011> Yes <strk> it'd take a votebot to only count votes from people in MAINTAINERS file <lunny> That’s maybe no meaning. <lunny> I think we should support search <lunny> if we support audit log <geek1011> We can parse the log <lunny> line by line is very low-efficent <geek1011> or an admin can use grep <geek1011> grep is very efficient <geek1011> I use it to look through my system logs every day <strk> I think logs storage should be up to system admin <joubertredrat> guys <joubertredrat> https://github.com/settings/security <geek1011> Also, a file can be integrated with fail2ban <lunny> But windows has no grep <lunny> :smile: <joubertredrat> on this link is possible to see one audit example on botton <joubertredrat> [edit] on this link is possible to see one audit example on bottom <geek1011> cygwin, gnuwin32, so on <geek1011> for windows <geek1011> I was thinking of using a format for the file like: 2017-03-14 10:44:00 EST: action:repo.push user:geek1011 repo:test/test ip:127.0.0.1 commit:aaaabbbb <lunny> `adding or removing an SSH key`found a typo of github <geek1011> My format is easy to search <geek1011> cat audit.log|grep "user:geek1011" or something more complex like cat audit.log|grep "action:repo.push" <lunny> an audit log is one per user? <geek1011> nope <geek1011> it contains everything <strk> if it's to be controlled by Gitea to do extracts, I'm also for DB <strk> then Gitea can also set window time to keep, for example <geek1011> for example to find all usernames who have force pushed, something like: `|grep "action:repo.force_push"|cut -d" " -f 6|cut -d":" -f 2` <strk> and SQL power users can do easier analisys <geek1011> not everyone is a sql power user <geek1011> most admins would be bash power users <strk> indexes could speed up searches by users for example <geek1011> grep is pretty fast <geek1011> I have used it on 400mb log files <strk> geek1011: the thing is, do you want this log to be read directly or via Gitea ? <strk> I know grep is fast <tboerger> i still don’t like to write it to file. this should be integrated into the admin view. and there the database is the only option <geek1011> for example if i wanted to search the audit logs in my format for users who have forced push to geek1011/dont_force_push, I would do `|grep "action:repo.push"|grep "repo:geek1011/dont_force_push"|cut -d" " -f 6|cut -d":" -f 2` <strk> is that log part of a Gitea's instance "data set" ? (ie: needs to be moved around while dumping data ?) <geek1011> exactly @strk, that is another reason why to put it in a file <geek1011> And see how simple my examples are <strk> well, exactly what ? mine was a question ...what's your answer ? <geek1011> No <geek1011> Also, every other log is in a file <strk> I've no problem working with files (if format is good, which is something the gitea log could improve) <geek1011> Yes <geek1011> see my example above <geek1011> for example if i wanted to search the audit logs in my format for users who have forced push to geek1011/dont_force_push, I would do `|grep "action:repo.push"|grep "repo:geek1011/dont_force_push"|cut -d" " -f 6|cut -d":" -f 2 <tboerger> store it in the database and add a cli command to gitea which streams the log. than both sides can use the favorite tools <strk> tboerger: why do you want it in the database ? <geek1011> github stores it in a file <strk> tboerger: do you think it's part of the "instance data" ? <tboerger> @strk i said it above <strk> would you want it in your dumps and backups ? <tboerger> it should be displayed in the ui. and you don’t want to paginate a log file within the ui. <strk> my scrollback area is too small (would prefer a mailing list) <strk> ok <geek1011> so does bitbucket <geek1011> https://confluence.atlassian.com/bitbucketserver/audit-logging-in-bitbucket-server-776640417.html <tboerger> and for the backup you can also add a flag that includes or excludes the audit log <strk> so tboerger wants DB because it's easier for Gitea to show it on the UI <geek1011> bitbucket stores it in <Bitbucket Server home directory >/log/audit <geek1011> It will then parse it for the admin ui <strk> sounds doable to me <strk> the only meter I see they could be compared is speed (of write and read) <strk> write seems very easy to a file <geek1011> yes <strk> oh, and rotate <geek1011> I would use the built in logging framework <strk> ie: re-open the file for writing after moving it <tboerger> and within the database the level of pruning is much easier to integrate <strk> that'd do for write and rotate (also needed for other logs) <strk> tboerger: what do you mean ? <strk> what kind of "level of pruning" do you think would be harder with files ? <geek1011> @strk and @joubertredrat: so do you think we should use a file (or do you think we need to talk about it more)? <tboerger> it’s much easier to say in a database „prune everything older than 7 days" <geek1011> Not really, look at the existing logging in gitea <tboerger> of course it is <geek1011> ```` <geek1011> { <geek1011> "level": 1, <geek1011> "filename": "/home/patrick/gitserver/gitea-data/log/gitea.log", <geek1011> "rotate": true, <geek1011> "maxlines": 1000000, <geek1011> "maxsize": 268435456, <geek1011> "daily": true, <geek1011> "maxdays": 14 <geek1011> }```` <tboerger> there you have to configure a totally separate service... <geek1011> What do you mean? <tboerger> i’ve got to work. i don’t want to waste time now for that. <gitter> [Github] tboerger commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286451494 <gitter> [Github] geek1011 commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286451753 <geek1011> Let's vote on this now (unless you do not want to) <matrixbot> `strk` Patrick G (Gitter): I vote file <tboerger> there is not any benefit to store it on the file... <geek1011> There is a lot of benefit <tboerger> most admins don’t want to connect via ssh if they are anyway already on the web ui <gitter> [Github] ptman commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286452558 <geek1011> we can show the log in the ui as well <tboerger> lol, not really <geek1011> What do you mean? <geek1011> We just take the file contents, put them into a striped table, split the date, and there <tboerger> pagination with log files that have to be parsed beside that is a PITA!!!!!!! <geek1011> I will list the benefits: storing logs on tmpfs on a pi, grep and cut and other command line utilities, easier viewing in a text editor, smaller database size, easier to extend, easy log rotation, fast offline queries, and more <strk> tboerger: the same cost is on the database side (CPU cost) <strk> it's just about having or not a pre-written Go facility to do it <geek1011> We can copy and modify the existing logger <strk> geek1011: I think if you try to do that paginator things might be easier to "vote" on (I mean, voting on a theoretical thing isn't always easy) <geek1011> what? <strk> geek1011: I think the scenario is this, from the UI: <tboerger> I gave up <strk> - You want to get a list of logs matching a given filter <geek1011> Should I just abstract the audit logging so we can change it at our whims? <tboerger> Do whatever you want. It doesn't work with clear statements. <strk> - You want to be able to navigate (next page, prev page) that result <strk> tboerger: is the above clear enough ? does it contain your concern ? <gitter> [Github] geek1011 commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286454283 <geek1011> We can just send some entries from the log file with xmlhttprequest (it shouldn't get too large), and parse it on the client side with js. <strk> geek1011: note that the "result" of the "query" (filtered out portion of the log) is specific for a user session <geek1011> Also, if you want to vote, I have a comment here: https://github.com/go-gitea/gitea/issues/8#issuecomment-286454283 <strk> and note that "pagination" is done specifically to reduce the amount of data sent from backend to frontend... <geek1011> @strk, but the audit logs would be for the admin <geek1011> not the users <geek1011> user activity should be seperate <gitter> [Github] strk commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286455129 <gitter> [Github] strk commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286455612 <strk> did you check the link sent before ? <strk> 15:48 < joubertredrat> 1440:https://github.com/settings/security <strk> that's showing *my* activity here (while logged in) <strk> where *my* includes some events that are also of interest for others (ie: team events) <gitter> [Github] ptman commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286456126 <strk> Security history <strk> bottom of the page, " <strk> not those many lines for me, and no pagination support <gitter> [Github] geek1011 commented in go-gitea/gitea on issue: audit logs https://github.com/go-gitea/gitea/issues/8#issuecomment-286456357 <strk> oldest on Dec 24, 2016 <strk> newest 3 days ago <geek1011> That only shows security events <strk> I dunno what other mean by "audit log" <geek1011> strk: see https://help.github.com/enterprise/2.9/admin/articles/audited-actions/ <strk> to me, having a proper log (with syslog like facility/priority and timestamps) would be a great start already :) <geek1011> Yes, I agree <strk> then I could grep for "audit" proprity, for example, and be done :) <geek1011> strk: should I do it that way <geek1011> by putting it into the normal logs but as a new log level or with a prefix? <strk> you could start by checking what's wrong with the "normal logs", and make them behave <geek1011> what do you mean <strk> for example by having them all in a single file ? <strk> rather than 3 different files and console output <geek1011> They are seperate for a reason <geek1011> database gets a lot of entries <geek1011> it fills fast <strk> proper syslog-like behavior is you set priority/facility and then you can configure centrally which combination of them goes where <strk> then you'll decide that facility=db goes to a separate file <strk> (for example) <strk> I didn't look at how those logs are configured <geek1011> I do not have time to rewrite the whole logging system atm, it would also be a breaking change <geek1011> I think the audit logs should be separate for now <strk> I don't know what you were planning to do, I'm just saying what problem I currently have with those logs <geek1011> me too <strk> you're saying you want a new kind of logging which has a well-defined structure, is that what you mean by "audit logs" ? <geek1011> yes, and it would log a lot more <tboerger> discuss it on irc or the issue. the discussion is already long enough and i’m always getting notified
<strk> geek1011, mosez : I still think (beside the audit log) that we should define a proper voting system <geek1011> strk <strk> geek1011: if I open conf/app.ini I find these log-related sections: <strk> [log] <strk> [log.console] <geek1011> Yes <strk> [log.file] <geek1011> but that is log destinations <strk> [log.conn] <geek1011> and the log format is also quite random too <strk> [log.smtp] <geek1011> and [log.database] <strk> [log.database] <strk> end <strk> that, alone, gives me headache :) <geek1011> yes * geek1011 wants to scrap the whole existing logging system <strk> same here * geek1011 wants to standardize the logs <strk> it's like re-doing syslog * geek1011 thinks this would be a main reason for companies not using gitea <strk> but on the other hand, this is Go <strk> everything is re-done :) <geek1011> yes <strk> *and* people insist in wanting to run it on Windows too <strk> and is self-contained and such <geek1011> yes <strk> so, to me it'd be enough to be able to say to Gitea: just log everything to stdout <geek1011> it would be better to distribute qemu with alpine linux <ptman[m]> there's been a bit talk in go-land about how logging should be done sanely <strk> or: just log everyting to syslog (better) <strk> only concern is about the format of those stdout or syslog lines <strk> 16:32 * geek1011 wants to standardize the logs <strk> ^^^ do that please ! :) <strk> would be a great step in the proper direction * geek1011 does not have enough time <strk> BUT, as long as the system allows putting logs in database or file <strk> what would it mean to "format" the logs ? <geek1011> yes <strk> as they would have a different format in DB and in files <strk> what is currently done ? <geek1011> the database could be 1 entry which contains the log :P it would make tboerger happy! <strk> do you have time to understand the current code ? <geek1011> yes <strk> that's also a good start <geek1011> I am still learning go right now though <geek1011> But I think i understand the logging so far <strk> we're not here to make tboerger happy, but to maintain a usable pretty little code hoster :) <geek1011> yes <geek1011> and a trackable one <strk> we might be all switching to TMM's one once it's ready :P <geek1011> companies need to be able to track events (and so do security freaks) <strk> agreed <geek1011> I think the log file should be like this: <strk> and also need to find out what went wrong in some cases, with different levels of details (so log levels) <ptman[m]> https://groups.google.com/forum/#!topic/golang-dev/F3l9Iz1JX4g <ptman[m]> https://dave.cheney.net/2015/11/05/lets-talk-about-logging <ptman[m]> https://dave.cheney.net/2017/01/23/the-package-level-logger-anti-pattern <geek1011> YYYY-MM-DD HH:MM:SSTZ: category.action: property:value property:value <ptman[m]> why is category.action not just another propery:value -pair? <ptman[m]> greylog http://docs.graylog.org/en/2.2/pages/gelf.html <geek1011> 2017-03-14 11:38:00EST: repo.force_push user:geek1011 repo:user/repo commit:aaaabb remote_ip:10.0.0.10 <geek1011> then we can do cool stuff using grep and cut and sed <geek1011> A database would be harder to do key value stuff <ptman[m]> ah, timestamp should also just be property:value <geek1011> then it is harder to sort <geek1011> and read <geek1011> but we could do that <ptman[m]> well, you could suppress the label in text output <geek1011> what do you mean <ptman[m]> and do some tricky sorting as well <ptman[m]> sort timestamp first, the rest in alphabetical order <ptman[m]> or somesuch <ptman[m]> event = { ts: '2017...', user: 'geek1011', repo: 'user/repo', remote_ip: '10.0.0.10', commit: 'aaaabb', action: 'repo.force_push'} <ptman[m]> log(event) <ptman[m]> if output_console || output_textfile { <ptman[m]> format .... <ptman[m]> -> <geek1011> thats good <geek1011> what does strk think? <ptman[m]> 20127-03-14 11:38:00+0x00 action:repo.force_push commit:aaaabb remote_ip:10.0.0.10 repo:user/repo user:geek1011 * strk stops reading https://dave.cheney.net/2015/11/05/lets-talk-about-logging on the "nobody needs Warning level" line * geek1011 is posting the irc and gitter logs on the issue ptman[m]> well, maybe sort timestamp first and action second, rest in alphabetical <ptman[m]> if you can make sure that all contain ts and action <ptman[m]> well, warnings aren't very actionable <ptman[m]> I saw <geek1011> so how should we do this? <ptman[m]> but I think absolute opinions aren't very good <ptman[m]> I'd make the level a property as well <geek1011> strk: how are we going to discuss this with tborger(mosez), lunny, appleboy, you, and I and come to a conclusion so I can start working on this? * strk reads the backlog here <geek1011> strk: it's all on the issue <geek1011> strk: how about https://astaxie.gitbooks.io/build-web-application-with-golang/content/en/12.1.html <geek1011> https://github.com/cihub/seelog <geek1011> https://github.com/cihub/seelog/wiki/Format-reference * geek1011 can't wait to get started <strk> with seelog, you want to start ? <geek1011> with anything <ptman[m]> seelog doesn't seem to align with the proposed go standard logging interface: https://docs.google.com/document/d/1shW9DZJXOeGbG9Mr9Us9MiaPqmlcVatD_D8lrOXRNMU/edit# <strk> the purely tag-based logging seems interesting, but I dunno how good it would work for humans to read <strk> also there's the problem that was raised about enforcing some labels <strk> (like timestamp) <strk> and also sometime you want some values to be available w/out the developer writing it <strk> like for example filename/linenumber/timestamp <geek1011> I am starting to think I need to have a function to abstract all the logging formats so I can start putting the audit logging in, and leave the hard part of choosing where it should go for later. <strk> any chance to get that with Go ? <strk> but anyway that's another "kind" of logging, one targetted at *debuggin*, not at *auditing* <strk> geek1011: +1 on that (choosing where it goes later) <geek1011> yes <strk> that's why syslog is so popular, isn't it ? :) <geek1011> yes <strk> apps just need to declare the log level and facility and format at their wish the content <geek1011> I would just have a function which is called with an object of keys and values and would put them somewhere <geek1011> like date, message, and whatever else <geek1011> it would be the easiest to get started with <geek1011> We are all spending too long on choosing the logging format <strk> about logging, isn't this whole thing about standardizing a format ? <geek1011> kind of <strk> that was the start <geek1011> and also, I do not see the use of putting it in the database <strk> so, it's not too long, as it's the actual work to be done :) <geek1011> it is not what databases are meant for strk> I think it's irrelevant <geek1011> what do you mean? <strk> as long as I can have it on a file :) <geek1011> yes <strk> but to be honest <geek1011> yes <strk> logs, usually, are really just append-only things <geek1011> yes <strk> in this case the reason why mosez wants them in the DB <strk> is because he want Gitea to also read them <geek1011> yes, and audit logs are meant for only auditing <strk> so the generic logs approach of "decide later where to put them" <strk> doesn't work here <strk> because you really want to know *exactly* where it goes <strk> because it is where you'll go reading it <strk> and where you'll want to parse it too <strk> that's also the reason why you want to carefully define the format, because you'll have to parse it <strk> that's also the reason why you want to carefully define the format, because you'll have to parse it <geek1011> we should have it stored as some type of key value entry <strk> that said, at that point DB or File choice has to be decided once and stick <geek1011> with at the minimum date and action <strk> some keys you always want defined: timestamp <geek1011> strk: can you vote on the voting comment I made? <strk> and action, yes <strk> then depending on action you can have a different sub-structure <strk> but would always be the same <strk> not arbitrary <strk> for action X you'd always have a given set of keys <geek1011> yes <strk> Go types <strk> well-defined structs <strk> then you'd be sending those structs to the audit logger <geek1011> I think for everything, we need at the minimum: timestamp, action, user, message <geek1011> strk: yes, exactly <strk> each struct would have a serializer to write them and a deserializer to parse them <geek1011> strk: go has one built in <geek1011> and it can go to json and xml! <strk> at that point you can see file doesn't necessarely cut it anymore <strk> I mean, you could be lost with grep... <geek1011> what do you mean "lost with grep" <strk> if the built outputs json and xml <geek1011> we can have a custom format as well <strk> you're loosing the line-based nature of a greppable log <geek1011> as just timestamp key:value key:value <strk> sure, but it's more work <geek1011> not that much more <geek1011> and I'll be the one doing it <strk> a commandline to search in database might as well be easy, if the data *belongs* there <strk> that's what we have to decide, if it belongs there or not <strk> right, if you do it, I think it's easier to vote *after* :) <geek1011> yes, but I have said why I do not think it should be in the database <strk> at lest nobody can say "it would be too hard to implement" <geek1011> 1. Logs are almost never stored in a database <strk> because it would be already implemented at that point <strk> these are not common logs <geek1011> 2. The database would have a lot of writes this way <strk> are events <strk> historical events <geek1011> 3. Not everyone knows sql <strk> strictly structured <geek1011> 4. Audit logs should be append only <strk> gitea audit # command would give you something to grep, if you want <geek1011> 5. Audit logs should be seperate in case of database failure <strk> 4: agreed <strk> 5: agreed <strk> well, actually, not sure <geek1011> 6. Audit logs should be stored in separate filesystems if possible <strk> it depends on what you include in "audit" <strk> is database failure more about "debugging" log ? <strk> or why should they be sepearated ? <strk> security measure ? <geek1011> strk: audit would be actions which are useful for figuring out who did what <geek1011> strk: yes <strk> yep, makes sense <geek1011> strk: audit logs are for things like finding abuse and misbehaviour <strk> how about 2 channels ? <strk> some "buffer" is kept in DB and a file is always streamed into <strk> so DB does the purging and UI <strk> and external (file) is for security/integrity/do-what-you-want <geek1011> strk: +1 <geek1011> strk: but that should be implemented after to reduce merge conflicts <strk> then you don't need to know where those logs go (could go to a different machine, via rsyslog) <geek1011> yes <strk> now this approach seems to match with the different [log] sections :) <geek1011> yes <ptman[m]> definitely better to have multiple destinations for logs which can be selected in config <geek1011> ptman then there is more complexity <ptman[m]> true <ptman[m]> but that way you get both searchable logs in db and append-only logs via e.g. syslog <strk> ptman[m]: but you cannot allow configuring the format of the in-db ones <strk> at least for this audit task <geek1011> strk: look at my comment at the bottom of the issue so far. <geek1011> Tell me what you think of it <strk> it's probably just the audit event saver that would log via generic logging and well-structured format <strk> right ? ^ <ptman[m]> strk, that's a problem, yes <strk> ptman[m]: so my idea is audit would use the Db and have admin-configured "timespan" of lifetime <strk> while the audit Db writer would log via generic log framework, trying to be as easily greppable as possible <strk> would that work for you geek1011 ? <ptman[m]> I'm definitely for syslog, not against <strk> ptman[m]: that'd be a possible output for generic logs, right ? <geek1011> yes, but I do not know how to implement that. We should start with a file, and maybe add the database as a separate pr. This pr is already going to be quite large, and we want there to be as little possibility of merge conflicts as possible. <ptman[m]> gelf could be added <ptman[m]> as an option <ptman[m]> gelf documentation also has cons about syslog <strk> geek1011: I think it's best if you start with Db, store Json in there <strk> well, ok I'd have to think more about it <strk> (DB format) <ptman[m]> generic? as opposed to audit? well, I'd like any logs to be directed to any location <ptman[m]> at work we do store logs as json in db <geek1011> strk: a file would be simpler to start with with less chance of bugs <strk> ptman[m]: "audit", in my dictionary, is the operation of *reading* logs <ptman[m]> works quite well, at least with postgresql <ptman[m]> strk, ok, well action log then, as opposed to error log? <ptman[m]> what did you mean by generic? <strk> see the issue topic: https://github.com/go-gitea/gitea/issues/8#issue-186891111 <strk> it's about "displaying" it <geek1011> also see the last comment (mine) <geek1011> what does ptman and strk think of it so far <strk> so, go with file if you want, but don't let app.ini break your parser :) <strk> ie: don't use the generic logging <ptman[m]> I was reading the summary comment <geek1011> what summary content <ptman[m]> geek1011 (IRC), your summary comment <ptman[m]> the one the summarized discussion <geek1011> yes, it's not done <strk> the summary content is about "logging storage location" but if you want to read them back "syslog" is out of control <strk> if you want to read logs you MUST be in control of the location <geek1011> good poine <geek1011> point <strk> so either DB (the Gitea DB) or a Gitea-controlled file <strk> so under data/ <strk> would be effectively the same as going in the DB at that point <ptman[m]> strk, well, I'd be fine with only being able to display logs in the web-ui _if_ the database backend was selected <strk> in terms of where it goes when you "dump" it <ptman[m]> while also allowing the database backend to not be selected for storing logs <strk> ie: when you "dump the Gitea instance data" <strk> ptman[m]: that'd make the people who want file-based logs class B citizens :) <strk> as they'd have to giveup reading some logs from the UI <ptman[m]> but it would be by choice <ptman[m]> not by force strk> or well I guess you could still keep your audit.log where other logs are and have Gitea read them (but won't work with syslog) <geek1011> yes, but what about on windows? <strk> so my vote is: code some more and see were you land :) <geek1011> strk I kind of need to start somewhere <strk> start were you feel more confident ! <geek1011> We should get some votes based on my summary comment from all the maintainers <geek1011> remember what you said about getting a good logging format strk <geek1011> it cannot be changed easily once we decide on one <geek1011> for the record, any ideas about the format for the database? <geek1011> if we use it <ptman[m]> elsewhere I'm using: CREATE TABLE logs (id serial PRIMARY KEY, created timestamp with time zone NOT NULL DEFAULT now(), user_id int REFERENCES users(id), content jsonb); <ptman[m]> not sure if it suits gitea <strk> if you think the data won't be much it could as well be just jsonb, right <ptman[m]> the id for example isn't very nice, but the orm would have trouble without it <strk> but remember Gitea has an abstracted DB, so jsonb.. ? <ptman[m]> there's nothing to guarantee that two logs cannot have exactly the same timestamp <strk> is it supported by all backends ? <ptman[m]> I doubt it <ptman[m]> newer mysql has json support <ptman[m]> and sqlite has json functions <strk> so text be it <ptman[m]> but sqlite doesn't really care about types <geek1011> the parsing may be slow <strk> then if you want to filter/search you'll have to decode each and look inside (or do text filtering) <ptman[m]> and I'm not sure about mariadb and json <strk> "parsing may be slow" <strk> you'll need to define the kind of access to those tables <strk> which logs showing UIs will you have ? <strk> per-user and per-system ? <strk> how many events will you need to parse *per_page* ? * geek1011 is being driven nuts trying to keep the chat log comment up to date in time with the chat :D <strk> don't ! <strk> are you using irssi, btw ? * geek1011 likes to keep records of everything <geek1011> strk: I use hexchat on my computer and issri on my phone <strk> some clients have a nice /log /tmp/gitea.log <strk> (like mine) <ptman[m]> I finally left irssi for weechat (there's a nice weechat android app that communicates with your weechat instance), but now I'm switching over to matrix slowly <strk> ptman[m]: I've been there too (for the app) but then switched to matrix and removed weechat (slower than irssi for waht I needed) <strk> I'd still considered weechat as a Matrix console interface, but that script was too slow when I tried it <geek1011> strk: Can I include you as part of we for the description of the file loggging location in my summary comment? <strk> I don't see that comment so can't tell <strk> anyway, do your proposal and will be reviewed :) <geek1011> I will tell you when I have written it <geek1011> I am putting the stuff in the summary first before the proposal because it is easier to edit than a pr <ptman[m]> I'm now chatting through the script, seems to work well <ptman[m]> nice to have something other than a web interface to matrix <strk> geek1011: even easier could be a wiki page :) <strk> this is really issues abuse :D <geek1011> strk: I will move it there as soon as I finish it for now <strk> ptman[m]: take a look at your top output <strk> (maybe it was recently fixed, or you'll see high CPU usage periodically) <ptman[m]> by cpu? by mem? <strk> ptman[m]: by cpu <geek1011> htop is better <ptman[m]> strk, well, maybe it is high, but now too high for me <strk> ptman[m]: I'm using my console client on a remote server and when I hit a key I'm used to see it immediately, but with that weechat matrix plugin I had to wait to see the echo of my typing <geek1011> How's this strk: We (@geek1011 and @strk) are proposing using a file for storing the audit logs. The file would be named audit.log, and the log entries would be appended to the file. The file would most likely be stored with the rest of the Gitea logs. The pros and cons are listed below. <strk> top showed 100% CPU when that happened <strk> geek1011: I'm not sure I'd call that a "store" either <strk> it's something you want to prune, right ? <geek1011> what? <geek1011> I do not understand <strk> we want a circular limited store for the UI and an append only (stream-like) output for logs <strk> I think I want two distinct things <strk> or, do people want to keep it infinite ? <strk> think about purging too <ptman[m]> strk, ah, I also use mosh, which predicts typing <geek1011> we could have optional log rotation <ptman[m]> strk, maybe that helps? <geek1011> Actually, I think I'll get rid of the first sentence. <geek1011> ptman, strk: any other things I should add to the summary comment? <geek1011> After I finish the comment, I am done working on this for today. <ptman[m]> well, against the not everyone know sql comment <ptman[m]> there could be a gitea admin log -cmdline interface <ptman[m]> syslog can be used to put the logs into elk or graylog <ptman[m]> which then again _could_ be queried, but that would take a lot of implementing <geek1011> ptman: added that stuff <geek1011> ptman, strk : any other requests? <geek1011> ok, I'm gone in a few minutes <ptman[m]> I think that is a quite good summary <geek1011> thanks <strk> ptman[m]: predicting typing as a countermeasure to crazy CPU activity ? nah :) <geek1011> strk: we should use meetbot <geek1011> strk: it would make everything so much easier <strk> geek1011: I'm going out with the dog. Please use the wiki more :) <geek1011> what dog? <geek1011> nevermind <ptman[m]> strk, usually against network latency, but doesn't discriminate ;) <geek1011> strk: I do not like wikis much <ptman[m]> geek1011, if you could, maybe redacting the irc logs in preference to the summary would make it easier for people to read <geek1011> then people would not know if the summary is biased <geek1011> I like transparency and fairness * geek1011 is finishing the summary and then leaving ------ A while later ------ <ptman[m]> well they can't be sure the logs aren't doctored either <geek1011> that's why we need something like irccloud <geek1011> for logging the irc messages <ptman[m]> well, matrix does it for me <ptman[m]> but maybe a logging bot for this project/channel <strk[m]> For me too (but no public url to show log) <strk[m]> Logging bot +1 <geek1011> strk: I put it in the misc section of https://github.com/go-gitea/gitea/issues/8#issuecomment-286474772 <geek1011> strk: I am finished the summary. Please feel free to suggest or edit(if you can) it. <MTecknology> meetbot is fun stuff <geek1011> yes <MTecknology> $client is aggravating stuff <strk> "Needs a format to be defined" as a Cons for the File section doesn't really make sense, also the DB structure needs to be defined <strk> geek1011: ^ <geek1011> strk: OK, i'll fix that <geek1011> also, have a look at what i did to https://github.com/go-gitea/gitea/issues/8#issuecomment-286463807 <geek1011> tell me what you think <strk> I'd add a Pro: isn't considered "data" for the installation <strk> ie: isn't dumped with the rest of the data <strk> (as mosez said, there could be code logic to skip it form a dump, but a simple <gitea-agnostic-db-dump> would not do that) <geek1011> done <strk> Pro 4 of Database : ^does^do <geek1011> strk: what do you think i did to the chat logs? <strk> anyway I'm not clear what he meant by "manageable" <geek1011> yes <geek1011> strk have you figured out what i did yet? <strk> Con 6 of DB is invalid to me (it's not harder to extend, depending on format) <strk> point 7 again is invalid (as it was for the File one".... anything needs a format to be defined, if it has to be parsed :) <strk> add me to the fail2ban point :) <geek1011> strk: you said point 7 of the db yourself
This summary summarizes the chat from today.
The first name inside brackets is the person who said the thing, and the other names are people who agree. Bold items are ones which are particularly important.
Prepared by @geek1011 with @ptman's and @strk's help.
If anyone wants something added to this or has any comments or votes (specifically @go-gitea/maintainers), please comment on this issue.
Based on:
@geek1011 and @strk and @ptman and maybe @joubertredrat
The audit logs could be stored in a file. The file would be named audit.log, and the log entries would be appended to the file. The file would most likely be stored with the rest of the Gitea logs. There would be optional log rotation and purging. The pros and cons discussed in the chat are listed below.
grep
and cut
(@geek1011 and @strk)Possible
"event = { ts: '2017...', user: 'geek1011', repo: 'user/repo', remote_ip: '10.0.0.10', commit: 'aaaabb', action: 'repo.force_push'}" (@ptman)
"2017-03-14 11:38:00EST: repo.force_push user:geek1011 repo:user/repo commit:aaaabb remote_ip:10.0.0.10" (@geek1011)
"Go types, well-defined structs, then you'd be sending those structs to the audit logger" (@strk)
@lunny and @tboerger and @andreynering
TODO
@geek1011 and @strk
The log entries would be sent to syslog for syslog to manage.
@geek1011 Done, is on references
I can start working on this once we decide:
I prefer storing in database.
For disk space reasons, I would try not store much information as string, but as int enums instead:
User | Repo | Action |
---|---|---|
1 | 1 | 1 |
1 | 1 | 3 |
5 | 8 | 5 |
5 | 2 | 4 |
Action would be an enum like this:
1 - Clone 2 - Push 3 - Merge 4 - ...
@andreynering How about more complex actions and metadata such as IP address? Also, IMHO databases are not meant for logging.
Yes, more data would be needed than I listed, like IP. That was a example.
Maybe we can have an JSON column to store extra data for some actions ({"branch":"foo"}
). But that would not be necessary for all actions.
@andreynering do you still see the need to log those actions in a parseable format so that admins can keep event history for accounting ? I think that's important, even if a portion of that is kept in DB for easier management (you do want the DB audit log to be purged right ?). Also @andreynering would you include such audit log table in Gitea dumps (external db management tools would do)
I'd include it in the backup dump. I'd also not enable any automatic purging by default.
I think it's up to the sysadmin to choose if and how often these logs will be purged.
@andreynering what happens to the users who decide not to purge logs because they do not know about them, and get a filled disk?
@geek1011 SysOps who don't purge logs should not be sysops 🙂
As for format, if it isn't in the database, there's no way you can show it in the admin-panel correctly (pagination, sorting, searching, etc). That can not be implemented correctly using plain files.
As not immutability, for PostgreSQL, MySQL and MariaDB you can simply not give the db-user right to UPDATE
the audit-log table. For SQLite that isn't possible AFAIK, but people who need audit-logs shouldn't run on SQLite anyhow 😉
@bkcsoft Not every user of gitea is a sysop, and either way, the more automated, the better.
I was thinking that it should start by going into a file, then we can make it also go in the db when we have decided on a structure for it.
Do not discriminate against users of sqlite (like me). The audit log can go into an append only file.
@geek1011 as you said, not every user is a sysop. So a log file is difficult to manage. I would like a log without UI. If there is a UI, please don't store it in a file.
Looking at the format in OP, seems like it'll be hard to do any queries anyhow.
If one splits information
into old
& new
you'll end up with this:
user | operation | old | new |
---|---|---|---|
joubertredrat | repo.create | nil | myorg/MyProjectData |
joubertredrat | user.settings | password | password |
tboerger | repo.fork | myorg/MyProjectData | tboerger/MyProjectData |
bkcsoft | repo.remove | myorg/MySource | nil |
tboerger | admin.auth | ldap.ou=Users | ldap.ou=Developers |
(password
obviously isn't the password...)
Makes for a lot of nil
s but at least it is searchable.
As for storage, we need to store it in a DB of sorts for searchablity. But we don't have to exclusively store it there. If we have a worker that pulls audit-entries into a multiplexer we can store it basicallly anywhere.
Something as simple as this could work
type AuditEntry struct {
When time.Time
Who *User
Operation string // (or some enum)
Old string
New string
}
func (ae *AuditEntry) Print() string {
// pretty print me :D
}
type AuditStorage interface {
Store(*AuditEntry) error
Load(limit int, after time.Time, before time.Time) ([]*AuditEntry, error)
Search(q *AuditEntry, limit, page int) ([]*AuditEntry, error)
}
Then implemeting FileAuditStorage would be simple enough
type FileAuditStorage struct {
os.File
}
func (fas *FileAuditStorage) Store(ae *AuditEntry) error {
_, err := fmt.Fprintln(fas, ae.Print())
return err
}
(Loading/Search obviously becomes harder for Files...)
The actual multiplexer would just implement AuditStorage
and hold multiple AuditStorage
s
Actually, given the NotifcationSystem by @andreynering we could just extend that to store Audit Logs as well
@bkcsoft So should I wait (for this proposal to be finalized) before I implement this feature, or should I start now?
@geek1011 I'd say wait, since I want a few more comments/reactions on the proposed interfaces :)
Any news on this?
I really want to get started, but since I have limited time, I do not want to start implementing this in a way that will not be agreed on.
@bkcsoft For your idea, I think it might be good with a few modifications for searchability, extensibility and usability:
user | userid | operation | old | new |
---|---|---|---|---|
joubertredrat | 1 | repo.create | { } | { summary: 'myorg/MyProjectData', owner: 'myorg', name: 'MyProjectData' } |
joubertredrat | 1 | user.settings.password_change | { } | { } |
tboerger | 2 | repo.fork | { summary: 'myorg/MyProjectData', owner: 'myorg', name: 'MyProjectData' } | { summary: 'tboerger/MyProjectData', owner: 'tboerger', name: 'MyProjectData' } |
bkcsoft | 3 | repo.remove | { summary: 'myorg/MySource', owner: 'myorg', name: 'MySource' } | { } |
tboerger | 2 | admin.auth | { summary: 'ldap.ou=Users', ldap: { ou: 'Users' } } | { summary: 'ldap.ou=Developers', ldap: { ou: 'Developers' } } |
I think the argument is how to store the log? Database vs file.
@lunny Doesn't matter. That can be "fixed" later... And models.Notice
is already in the DB ;)
so @geek1011 please go ahead
@lunny @bkcsoft @go-gitea/maintainers I finally figured out a extensible, but simple and efficient format for the logs. It will store as lines of json, and can easily be turned into a human readable entry.
See here: https://git.geek1011.net/geek1011/gitea-auditlog-poc
Please share your opinions, and then I will start adding it to gitea.
Hi, Look Good. But I still think store audit logs in database is a better choice.
@lunny in a database there will be space issues, and with sqlite, locking issues.
I also think that auditing must be done in database as it is not simple logging, if you want auditing that can be used in enterprise it has to be done at least in database.
+1 syslog, many reasons.
In my organization we are planning to use Gitea as our Git Server for all our projects, but Audit Logs are a must. Do you have any estimated date for when this is going to be implemented? I'd do it myself if I knew, at least, something about Go and Gitea code, but I think I'm not the right one to start implementing this, as I should first learn Gitea's architecture beforehand.
Related to https://github.com/go-gitea/gitea/issues/12902 - logging to file is probably the easiest, most maintainable and portable option. Text log files can easily aggregated to syslog by configuring simple imfile
watches.
If needed, audit logs can be implement like other loggers in https://github.com/go-gitea/gitea/blob/master/custom/conf/app.example.ini#L801 (AUDIT = [console,file,conn,smtp,database]
) - although you will always want file
(or smtp
...) for audit logs since they need to be forwarded to another trusted machine.
Hi,
I think that is good idea to have user operation log to admin see what the user is doing at Gogs. Similar to example below
This resource can be used on user page too, as user activity, set that log row is public (repo.) or private (user., admin.*) and display only public activity.
What you think?
Chat summary from March 14, 2017
References: https://github.com/gogits/gogs/issues/3016