Closed remydb closed 9 years ago
Some additional information
Linux log01 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64
Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04 Codename: precise
sagan.conf
var FIFO /var/run/sagan/sagan.fifo
var RULE_PATH /logwatch/sagan-rules-1.0
var LOCKFILE /var/run/sagan/sagan.pid
var ALERTLOG /var/log/sagan/alert
var SAGANLOGPATH /var/log/sagan
var SSH_PORT 22
var HTTP_PORT 80
var HTTPS_PORT 443
var TELNET_PORT 23
var DNS_PORT 53
var SNMP_PORT 161
var POP3_PORT 110
var IMAP_PORT 143
var SMTP_PORT 25
var MYSQL_PORT 3306
var MSSQL_PORT 1433
var NTP_PORT 123
var OPENVPN_PORT 1194
var PPTP_PORT 1723
var FTP_PORT 21
var RSYNC_PORT 873
var SQUID_PORT 3128
sagan_host <removed>
sagan_port 514
max_processor_threads 800
ignore_list: /logwatch/ignore.txt
output external: /logwatch/feed-me.php parsable
include $RULE_PATH/classification.config
include $RULE_PATH/reference.config
include $RULE_PATH/gen-msg.map
include $RULE_PATH/protocol.map
normalize: cisco, $RULE_PATH/cisco-normalize.rulebase
normalize: openssh, $RULE_PATH/openssh-normalize.rulebase
normalize: smtp, $RULE_PATH/smtp-normalize.rulebase
normalize: dns, $RULE_PATH/dns-normalize.rulebase
normalize: imap, $RULE_PATH/imap-normalize.rulebase
normalize: su, $RULE_PATH/su-normalize.rulebase
normalize: vmware, $RULE_PATH/vmware-normalize.rulebase
normalize: linux-kernel, $RULE_PATH/linux-kernel-normalize.rulebase
normalize: windows, $RULE_PATH/windows-normalize.rulebase
normalize: proftpd, $RULE_PATH/proftpd-normalize.rulebase
include $RULE_PATH/adtran.rules
include $RULE_PATH/apache.rules
include $RULE_PATH/apc-emu.rules
include $RULE_PATH/arp.rules
include $RULE_PATH/asterisk.rules
include $RULE_PATH/attack.rules
include $RULE_PATH/bash.rules
include $RULE_PATH/bonding.rules
include $RULE_PATH/bro-ids.rules
include $RULE_PATH/cacti-thold.rules
include $RULE_PATH/citrix.rules
include $RULE_PATH/courier.rules
include $RULE_PATH/digitalpersona.rules
include $RULE_PATH/fortinet-malware.rules
include $RULE_PATH/fortinet.rules
include $RULE_PATH/ftpd.rules
include $RULE_PATH/grsec.rules
include $RULE_PATH/honeyd.rules
include $RULE_PATH/hordeimp.rules
include $RULE_PATH/hostapd.rules
include $RULE_PATH/huawei.rules
include $RULE_PATH/imapd.rules
include $RULE_PATH/ipop3d.rules
include $RULE_PATH/kismet.rules
include $RULE_PATH/knockd.rules
include $RULE_PATH/linux-kernel.rules
include $RULE_PATH/milter.rules
include $RULE_PATH/mongodb.rules
include $RULE_PATH/mysql.rules
include $RULE_PATH/nfcapd-malware.rules
include $RULE_PATH/nfcapd.rules
include $RULE_PATH/nginx.rules
include $RULE_PATH/ntp.rules
include $RULE_PATH/openssh.rules
include $RULE_PATH/openvpn.rules
include $RULE_PATH/oracle.rules
include $RULE_PATH/ossec-mi.rules
include $RULE_PATH/ossec.rules
include $RULE_PATH/php.rules
include $RULE_PATH/postfix.rules
include $RULE_PATH/postgresql.rules
include $RULE_PATH/pptp.rules
include $RULE_PATH/procurve.rules
include $RULE_PATH/proftpd.rules
include $RULE_PATH/proxy-malware.rules
include $RULE_PATH/pure-ftpd.rules
include $RULE_PATH/racoon.rules
include $RULE_PATH/roundcube.rules
include $RULE_PATH/rsync.rules
include $RULE_PATH/samba.rules
include $RULE_PATH/sendmail.rules
include $RULE_PATH/snort.rules
include $RULE_PATH/solaris.rules
include $RULE_PATH/sonicwall.rules
include $RULE_PATH/squid.rules
include $RULE_PATH/ssh-tectia-server.rules
include $RULE_PATH/su.rules
include $RULE_PATH/symantec-ems.rules
include $RULE_PATH/syslog.rules
include $RULE_PATH/tcp.rules
include $RULE_PATH/telnet.rules
include $RULE_PATH/tripwire.rules
include $RULE_PATH/vmpop3d.rules
include $RULE_PATH/vmware.rules
include $RULE_PATH/vpopmail.rules
include $RULE_PATH/vsftpd.rules
include $RULE_PATH/web-attack.rules
include $RULE_PATH/weblabrinth.rules
include $RULE_PATH/windows-auth.rules
include $RULE_PATH/windows-malware.rules
include $RULE_PATH/windows-misc.rules
include $RULE_PATH/windows-mssql.rules
include $RULE_PATH/windows.rules
include $RULE_PATH/wordpress.rules
include $RULE_PATH/xinetd.rules
include $RULE_PATH/zeus.rules
include $RULE_PATH/true-custom.rules
Thanks for the info. It looks like the output threat for external is getting over written with new data (ie - improper mutex? lack of mutex?). I'll take a look and see what I can figure out. Will let you know the results ASAP.
We tried updating from the current git master and updated liblognorm to 1.0.1, we keep seeing the problem.
Any tips on reproducing the problem? I tried locally, but failed to reproduce. I'll re-try to reproduce the results and get back with you ASAP.
I don't think this is a liblognorm issue. I think it's related to the sagan-output.c / output-plugins/sagan-external.c. I'll look this afternoon and see if I can't narrow it down.
We are running with 300 messages per second on average, perhaps you have some ideas on what data we could provide to test this better.
I might have located the issue. I misplaced mutex initialization. I just pushed the fix up. Can you download the now RC3 and see if this corrects the issue? Thank you!
Thanks! We have RC3 running right now and will let you know by the end of the day.
Quick update; sadly this did not solve it.
Ok. I got something else to try. Let me get to the office and I'll push it up. Stay tuned.
On April 16, 2014 8:18:37 AM EDT, Daniel Koopmans notifications@github.com wrote:
Quick update; sadly this did not solve it.
Reply to this email directly or view it on GitHub: https://github.com/beave/sagan/issues/33#issuecomment-40591893
Champ Clark III Quadrant Information Security
Ok. Just pushed up another update. Can you give it a try. It's not a permanent fix, but if it works, it will at least point me in the right direction.
Alright, we are now running the update will keep you posted.
Sadly the changes had no effect, we are still seeing the issue.
You downloaded and recompiled? I've essentially made it a single theaded operation so now I'm really confused on the corruption. Can you send me more examples of the corruption?
Thank you.
On April 17, 2014 4:37:26 AM EDT, Daniel Koopmans notifications@github.com wrote:
Sadly the changes had no effect, we are still seeing the issue.
Reply to this email directly or view it on GitHub: https://github.com/beave/sagan/issues/33#issuecomment-40692816
Champ Clark III Quadrant Information Security
Correct, we made some adjustments this morning to run a unique mutex around the output and not use the same one defined. We are waiting for the results on that one also.
Mean while we will think up some adjustments provide you with more samples of data.
Thank you so far again for all the thought you put into this.
I'll see if I can't replicate this on our servers. This is bugging the hell outta me. :)
We are currently trying with our best C (very poor) to track the events within sagan so we can catch them going out both at the external processor and the alertlog, adding the id value in both will make it alot easier to find how or why things are out of sync i think.
In addition the mutex change, has kept the issue at bay for now normally we see this happen every hour atleast, its been two hours now. But we rather wait till tomorrow to rule it out completely.
// / External program support / /**/ pthread_mutex_t SaganOutputNonThreadMutex2=PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&SaganOutputNonThreadMutex2); if ( config->sagan_ext_flag ) sagan_ext_thread( Event ); pthread_mutex_unlock(&SaganOutputNonThreadMutex2);
I was able to replicate:
ID:1:5000116 Message:[SENDMAIL] Relaying denied Syslog message: I/O error occurred while writing; fd='9', error='No space left on device (28)'
[] [1:5000116] [SENDMAIL] Relaying denied [] [Classification: suspicious-traffic] [Priority: 2] 2014-04-17 10:07:36 10.X.X.X:514 -> 10.X.X.X:514 mail notice Message: I/O error occurred while writing; fd='9', error='No space left on device (28)'
I need to test more, but I found where "struct _Sagan_Processor_Info *processor_info_engine = NULL;" was being defined as a global in processors/sagan-engine.c. I moved that to a local variable and have been retesting. So far, this seems to fix this issue, but it certainly needs more testing before I can say 100% . It was definitely an issue either way. I just pushed up "external fix #3". Let me know how it works and I'll continue testing in house.
If it affects external and the ASCII alert log, it likely affects unified2. Which make this a much more serious problem.
We updated our setup with your new build and will keep you posted.
The latest version has been running for a good number of hours now and so far we haven't come across any bad alerts, so it seems to be working properly so far. We'll keep a close eye on it for the time being.
How is this looking? We've been running solid since yesterday. I'm interested in knowing your results! I'd love to close this ticket :)
I'm closing this out. From our testing, and what remydb has relayed, this sounds resolved. If it rears it's ugly head again, we'll just open another ticket.
My team has reported indeed that the issue has been solved, that is fantastic. Thanks alot!
Hello. On a side note, the fix for this also caused a memory leak. I've pushed the fix for the memory leak up to github. Just FYI.
Thank you, we did notice the changes you made last few days and we update the moment new things come out.
We are currently looking into an issue with getting excessive zombie processes from sagan when running external calls.
We will submit a new issue if we think this is something we cannot fix on our end.
sagan 16734 18.2 1.7 8686880 421140 pts/0 Sl+ 10:01 41:38 ./sagan -f /etc/sagan.conf
sagan 25531 0.0 0.0 0 0 pts/0 Z+ 10:21 0:00 [sagan-to-mq.php]
I'll have to look deeper into this. I can replicate. I just need to determine why it's happening. Sorry for the delay.
This has been fixed as part of https://github.com/beave/sagan/issues/36. This was largely due to a misplaced mutex_lock.
--<snip from issue #36>--
This was happening due to a misplaced mutex_lock(). I've tested the fix for this by "overloading" Sagan with events (10k events at a time) and monitoring to see if any "zombie" processes came up. The fix appeared to "stick". Please test when you have a chance and sorry for the delay. I'm going to close this. I'll re-open if you still have the same issue. I'm pretty certain it is fixed.
Hello,
We have recently updated to Sagan 1.0.0 (we've tried both RC1 and RC2) and have encountered some anomalies in the external output. We call a php script using the 'parsable' output. This PHP script doesn't do much except parsing the output to an array and feeding it to our message queue. After updating to 1.0 we've noticed that some alerts get sent to our PHP script with information that isn't correct. This is the information that our PHP script receives, when parsed to an array (the event_id is an md5 hash we generate and add to the array):
For debugging purposes we started writing this array to a log file for every received alert. We then received an FTP bruteforce attack which started logging, creating many instances of the following alert:
This is to be expected. However, one of these messages looked different. To illustrate:
As you can see the second alert contains exactly the same syslog message, however the message, classification, facility and syslog priority are incorrect when compared to the other two. Over the last few days we've seen this happening multiple times. So there seems to be a bug in the output being sent to the external program.