MHMDhub / enterprise-log-search-and-archive

Automatically exported from code.google.com/p/enterprise-log-search-and-archive
0 stars 0 forks source link

DNS requests #96

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Odd problem:  With syslog-ng configured to read nginx log files (which I was 
rsyncing to the Elsa server) The files are correctly parsed (with program 
override URL) and eventually searchable. However, as soon as these logs are 
synced to the Elsa machine an 
Extreme number of outbound DNS requests are generated. Noticed these via alerts 
so I installed dnsmasq and pointed resolve.conf to local host to log the 
requests. It appears that the requests are attempting to resolve "."  
I thought this was a syslog-ng issue, but changing the destination to a file 
and then pulling over the logs seems to avoid the lookups which point to 
something happening after syslogng is done?  Use_DNS(no) is still set in the ng 
conf. Anyways I might have botched a config somewhere but I haven't been able 
to figure out where else in the chain of components might DNS queries originate 
so I thought id put it up in case there's something specific I should look at 
or something in Elsa proper doing lookups? 

Anyways great project and thanks for your time!

Ubuntu 12.04, ELSA install script version current as of 1 week ago. 

Original issue reported on code.google.com by neome...@gmail.com on 15 Feb 2013 at 9:55

GoogleCodeExporter commented 8 years ago
Hi,

If you've already got use_dns(no) set, then the easy stuff is out.  Is the 
timing of the DNS requests such that it's exactly when the sync happens that 
the DNS lookups occur, or can you pinpoint that it's when syslog-ng and/or ELSA 
begin parsing them that the DNS lookups occur?  One ugly way of narrowing it 
down would be to find the pid of syslog-ng, then do 

strace -p <pid> -e trace=network

as you do the rsync to see exactly what syslog-ng is doing.  Keep in mind this 
will print a ton of info to the screen, so if possible, use a small test file 
to have chance to see what it's doing.

Once you confirm it's indeed syslog-ng doing the lookups, can you tell me what 
version of syslog-ng you're on?  Also, if possible, can you attach the 
syslog-ng.conf file (with anything sensitive redacted, of course) to this 
ticket?

Original comment by mchol...@gmail.com on 15 Feb 2013 at 11:12

GoogleCodeExporter commented 8 years ago
The requests start coming in in almost immediately as the rsync finishes -  I 
did have syslog-ng have a pretty small follow_freq() set on those logs so its 
hard for me to say absolutely though.  It did appear that with syslog-ng 
disabled, or with d_elsa set to simply dump to a file rather than pass to Elsa 
that the requests didn't show up - Perhaps the use_dns() in the option block is 
somehow not applying further down the line. 

I'll attach more detailed logs/confs and anything I can find out with the 
strace asap, probably I will have to wait to Monday to get them

Original comment by neome...@gmail.com on 17 Feb 2013 at 12:31

GoogleCodeExporter commented 8 years ago
Attached syslog-ng config:   Possibly worth noting there are several more web 
logs, configured the same way other than the paths of course;  also I was 
changing follow freq. up and down at various points to see what effect it would 
have on the amount of time it would take to become queryable in ELSA which 
might have impacted the number of dns requests I suppose.

Going to see if I can turn up any further details via strace

Original comment by neome...@gmail.com on 18 Feb 2013 at 3:15

Attachments:

GoogleCodeExporter commented 8 years ago
Begin rsync, pretty much right away: 

DNS queries in syslog:
Feb 18 10:34:24 uselsa01 rsyslogd-2177: imuxsock lost 12277 messages from pid 
28593 due to rate-limiting
Feb 18 10:34:24 uselsa01 dnsmasq[28593]: query[A] . from 127.0.0.1
Feb 18 10:34:24  dnsmasq[28593]: last message repeated 199 times
Feb 18 10:34:24 usaelsa01 rsyslogd-2177: imuxsock begins to drop messages from 
pid 28593 due to rate-limiting
Feb 18 10:35:01 usaelsa01 CRON[30704]: (root) CMD (perl 
/usr/local/elsa/web/cron.pl -c /etc/elsa_web.conf > /dev/null 2>&1)

strace with the -e trace=network option for some reason doesn't give me 
anything, general strace seems to be pretty much entirely file I/O.   

Original comment by neome...@gmail.com on 18 Feb 2013 at 3:51

GoogleCodeExporter commented 8 years ago
Did a:
ps auxw | grep elsa.pl | awk '{print "-p " $2}' | xargs strace -e trace=network

Which shows up the requests:

[pid 29607] connect(58, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 29607] sendto(58, "\r5\1\0\0\1\0\0\0\0\0\0\0\0\1\0\1", 17, MSG_NOSIGNAL, 
NULL, 0) = 17
[pid 29607] recvfrom(58, "\r5\201\205\0\1\0\0\0\0\0\0\0\0\1\0\1", 1024, 0, 
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) 
= 17
[pid 29607] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 58

Only many many more times.  Not sure what to look for from here

Original comment by neome...@gmail.com on 18 Feb 2013 at 3:56

GoogleCodeExporter commented 8 years ago
Ah, that's very helpful!  Ok, so it's definitely elsa.pl.  I think I know where 
it's doing it.  If the hostname after parsing isn't an IP, then ELSA does a DNS 
lookup (this is a property of Socket's inet_aton() function).  So, can you run 
syslog-ng with the d_elsa destination swapped for a test destination where it 
just writes out to a file?  I need to know what the third tab-delimited field 
looks like, as my hypothesis is that it's just a single dot where the hostname 
is expected to be.  If that's the case, then it's an issue with the log 
formatting as they come in.  It might be a case where you need to add 
"flags(expect-hostname)" to the "s_web09_site1" source.  Also, you should 
declare those sources above the main ELSA lines in an external .conf file, then 
have them included by adding "LOCAL_SYSLOG_CONF="/path/to/syslog/conf/file"" in 
/etc/elsa_vars.sh, then run sh install.sh node update again, and it will 
automatically include your custom syslog-ng include file in the right place.  
Your custom file should look like this:

source s_network {
    # no-multi-line means parse message as-is, newlines included, to allow for multi-line messages
    tcp(flags(no-multi-line));
    udp(flags(no-multi-line));
    file("/home/it/web09/site1.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
    file("/home/it/web09/site2.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
    file("/home/it/web09/site3.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
};

That way, we override the default ELSA network source to include your custom 
local sources.  This might not fix it, but then everything will be in the right 
spot.  Fixing the DNS  issue itself will be a matter of finding the right 
combination of flags on these file sources, so it might mean removing 
"no-parse" and adding "expect-hostname", or just adding "expect-hostname."

Original comment by mchol...@gmail.com on 18 Feb 2013 at 11:15

GoogleCodeExporter commented 8 years ago
Parsed line from output to file - 
'1361284967 uselsa01    url 7   11.111.11.111 - - [19/Feb/2013:06:44:49 -0500] "GET 
/uk/downloads.php HTTP/1.1" 200 12912 "http://www.google.com/" "Mozilla/5.0 
(Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 
Firefox/14.0.1" 92.105.24.254       200 12912           GET     /uk/downloads.php   http://www.goo
gle.com/    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 
Firefox/14.0.1  -'

Seems that its not a . but also not an expected IP.  

Tentatively - 
Adding expect hostname seems not to change things 
Adding expect hostname + removing no parse seems to have prevented the DNS 
lookups;  And parsing still looks good when running a search on the last 
imported class=URL's!

I haven't rearranged the configs yet, not sure if that would change the impact 
of the flags? Is the idea of structuring the config file that way just to 
utilize the existing s_network rather than custom sources?

Thanks for the help !

Original comment by neome...@gmail.com on 19 Feb 2013 at 3:26

GoogleCodeExporter commented 8 years ago
Unfortunate retraction, missed some overlap, the changed flags do clear the dns 
queries buy also strip out the nice parsing. Hmm, I'll try adjusting the config 
structure I guess and see what happens

Original comment by neome...@gmail.com on 19 Feb 2013 at 3:50

GoogleCodeExporter commented 8 years ago
Ok, good, so we've basically identified the source of the problem, now we're 
just narrowing the solution.  The config structure is because an ELSA update 
will overwrite the syslog-ng.conf, so in order to preserve your local 
customizations, we need to have those in an external file.  Can you show me 
what the logs look like now when you have expect-hostname and remove no-parse, 
but the patterns aren't parsing it correctly?

Original comment by mchol...@gmail.com on 19 Feb 2013 at 4:31

GoogleCodeExporter commented 8 years ago
Ahh understood.

Line with the flags switched:
'1361291836 uselsa01    url unknown - - [19/Feb/2013:07:23:53 -0500] "GET 
/uk/index.php HTTP/1.1" 200 11820 "-" "Mozilla/4.0 (compatible; MSIE 7.0; 
Windows NT 6.0; BTRS124342; SLCC1; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 
3.5.30729; .NET CLR 3.0.30618; .NET4.0C)"  '

[Which in ELSA results with program=url class=NONE and host=ip of the ELSA 
server being the only separated fields  ]

Original comment by neome...@gmail.com on 19 Feb 2013 at 4:40

GoogleCodeExporter commented 8 years ago
Ok, the fields look out of order.  It should go timestamp, host, program, 
class, message.  What does it look like in the web console?  I'm thinking we 
may need the no-parse included, but have to modify how the files are coming in.

Original comment by mchol...@gmail.com on 19 Feb 2013 at 5:12

GoogleCodeExporter commented 8 years ago
In the ELSA web frontend those lines look like

'- - [19/Feb/2013:06:48:33 -0500] "GET /uk/index.php HTTP/1.1" 200 11820 "-" 
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
host=172.1.1.1 program=url class=NONE'

Original comment by neome...@gmail.com on 19 Feb 2013 at 5:19

GoogleCodeExporter commented 8 years ago
Ok, so there's no program given because no-parse is on (we can set one using 
program_override() if necessary).  What does the original log line look like?

Original comment by mchol...@gmail.com on 19 Feb 2013 at 5:22

GoogleCodeExporter commented 8 years ago
Sorry, getting a little convoluted between conf switches so just to clarify:

Those examples are of the lines with no-parse removed and expect-hostname set 
instead.  Also I have a program override(url) set as in the original conf.

So now within the custom s_network source which is included I have multiple 
lines looking like:

file("/home/it/web09/site1.com-access.log"  program_override("url") 
flags(expect-hostname) ) ;

Which looks like the example in post #10  if dumped to a file, or post #12 if 
looked at in the ELSA web console.  The original log-lines look like:

111.11.11.11 - - [19/Feb/2013:12:32:48 -0500] "GET /uk/index.php HTTP/1.1" 200 
31817 "http://google.com" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) 
Gecko/20100101 Firefox/18.0"

If I add the no-parse flag back in,  I get a properly parsed log in the ELSA 
web console, but the DNS lookups trigger again

Original comment by neome...@gmail.com on 19 Feb 2013 at 5:38

GoogleCodeExporter commented 8 years ago
Ok, I think what's happening is that the r_host syslog-ng rewriter isn't 
working on file-based sources, so host comes in as a null field.  Then 
Reader.pm line 395 does the inet_aton() on an empty value, which performs the 
DNS request.  I'd like you to try something for me, and if it works, I'll put 
it in the source for good.  Go to Reader.pm line 395 and replace it with this:

$line[FIELD_HOST] = $line[FIELD_HOST] ? inet_aton($line[FIELD_HOST])) : 
2130706433;

and restart syslog-ng.  If I'm right, then the DNS lookups should stop and the 
host should show up as 127.0.0.1.

Original comment by mchol...@gmail.com on 19 Feb 2013 at 6:54

GoogleCodeExporter commented 8 years ago
Ok,  I replaced that line and restarted syslog-ng, and ran with no-parse added 
again. 

The DNS lookups still came in, but now I have host=0.0.0.0  in the web console, 
instead of the IP of the ELSA machine

Original comment by neome...@gmail.com on 19 Feb 2013 at 7:28

GoogleCodeExporter commented 8 years ago
Ok, let's stop guessing at what the value is there and have it log, stick this 
in the line just above and let me know what the debug output is:

$self->log->debug("FIELD_HOST: " . $line[FIELD_HOST]);

If it turns out that $line[FIELD_HOST] really does evaluate to a literal "." 
then I will be confused, because I don't see where in the message it would be 
coming from.  The fact that you're getting 0.0.0.0 as the host means that a 
string, not an IP is being evaluated by inet_aton().

Original comment by mchol...@gmail.com on 19 Feb 2013 at 10:15

GoogleCodeExporter commented 8 years ago
It seems to be properly pulling the localhost IP at that point [I don't know 
enough about Perl to know if it would have quotes or something in the debut 
output were it a string,  but at least its not just a dot]:

* DEBUG [2013/02/19 17:27:50] /usr/local/elsa/node/Reader.pm (396) 
Reader::parse_line 10790 FIELD_HOST: 127.0.0.1
* DEBUG [2013/02/19 17:27:50] /usr/local/elsa/node/Reader.pm (396) 
Reader::parse_line 10790 FIELD_HOST: 127.0.0.1

ad infinitum

Very strange

Original comment by neome...@gmail.com on 19 Feb 2013 at 10:32

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Some strategic commenting and uncommenting...I tried running and with both your 
new line and 395 enabled one at a time to the same effect.  I then tried 
commenting out the inet_aton conversion a few lines down, and it seemed that 
this killed the dns requests regardless of which line is used for the initial 
inet_aton() , so I'm guessing this inet_aton is the source.  I added a debug 
statement here but all the IP addresses I'm seeing logged look normal

if ($line[FIELD_CLASS_ID] ne 1){ #skip default since there aren't any fields
                # Convert any IP fields as necessary
                foreach my $field_order (keys %{ $self->class_info->{field_conversions}->{ $line[FIELD_CLASS_ID] }->{'IPv4'} }){
                        $self->log->debug("IPFIELD: " . $line[$field_order]);
                       $line[$field_order] = unpack('N', inet_aton($line[$field_order]));
                }

Maybe a change similar to your replacement line for the other inet_aton() but 
here instead?

[deleted and reposted for grammar/clarity edit]

Original comment by neome...@gmail.com on 20 Feb 2013 at 4:43

GoogleCodeExporter commented 8 years ago
Ah, yes, that could certainly be it.  Wherever inet_aton() gets called, there's 
the potential for a DNS lookup if it doesn't receive a valid IP to lookup.  I'm 
still confused as to why you're seeing normal IP's in the debug lines but 
seeing DNS lookups anyway.  I've done some testing to confirm the only way to 
do a lookup for "." is for an undefined value (an empty string) to be looked up 
in inet_aton().  So, let's try to put in the conditional lookup at the second 
lookup area, and that should take care of the lookup.

Original comment by mchol...@gmail.com on 20 Feb 2013 at 5:47

GoogleCodeExporter commented 8 years ago
With that loop changed to:

##########################################
 # Convert any IP fields as necessary
                foreach my $field_order (keys %{ $self->class_info->{field_conversions}->{ $line[FIELD_CLASS_ID] }->{'IPv4'} }){
                        $self->log->debug("IPFIELD: " . $line[$field_order]);
                     #  $line[$field_order] = unpack('N', inet_aton($line[$field_order]));
                     $line[$field_order] = $line[$field_order] ? inet_aton($line[$field_order]) : 2130706433;
                }
#############################

I seem to avoid the dns requests.  Every source in that batch seems to be 
0.0.0.0 , destination 127.0.0.1,  and host 127.0.0.1.

I wonder whether there could be whitespace that I'm not seeing in one of the IP 
fields causing the problem?  Or are these strings already stripped/trimmed 
earlier on?  

Original comment by neome...@gmail.com on 20 Feb 2013 at 7:41

GoogleCodeExporter commented 8 years ago
Sorry, this is my bad, the line:

$line[$field_order] = $line[$field_order] ? inet_aton($line[$field_order]) : 
2130706433;

Should be:

$line[$field_order] = $line[$field_order] ? unpack('N', 
inet_aton($line[$field_order])) : 2130706433;

Original comment by mchol...@gmail.com on 20 Feb 2013 at 10:35

GoogleCodeExporter commented 8 years ago
Ahhh Ok,  swapped it. 

So far this seems to have worked!

Looking at a fresh batch of logs from the web console, host and destination 
127.0.0.1 , correct srcip. I'll let it run and post back if anything crops back 
up.

Thanks again for the help resolving this! 

Original comment by neome...@gmail.com on 21 Feb 2013 at 3:01

GoogleCodeExporter commented 8 years ago
Comitted under rev 654.

Original comment by mchol...@gmail.com on 21 Feb 2013 at 8:28