amiracle / homemonitor

Splunk app for home | monitor >
25 stars 5 forks source link

Monitoring syslog file #4

Closed thecapacity closed 8 years ago

thecapacity commented 8 years ago

Thanks so much for building an awesome app - it's giving me a great chance to play with Splunk.

I've been trying to set it up to monitor a file vs. UDP and was wondering if you had any guidance.

It seems to "sorta" work - but I'm also trying to debug some rsyslog settings and wasn't sure how to continuously allow splunk to monitor the rsyslog file since I run it as a non-root user.

Any advice would be appreciate, and thanks again!

thecapacity commented 8 years ago

I think I've made it farther - I added splunk to the adm group (so it could read syslog files) and then added the file data source manually - but I can't figure out where to configure the source type (fios, comcast, etc). Unfortunately on the dashboard it just says "syslog" and I don't remember how to get to the manual configuration page for the "home monitor" app.

thecapacity commented 8 years ago

I think I found it - if you go into the data source you can manually set the source type - strangely I couldn't "pick from a list" of anything that equated to the home router type but I set it manually (based on your blog config guidance) - so now it's not asking for it as an option on the dashboard - but it still says "waiting for data".

(which might not be a surprise since I just configured it -- I'm wondering if there's some way I can import yesterday's data, which is in a separate file)

thecapacity commented 8 years ago

most (as in all I've found so far) of the dashboards (even if I set them to the last 30 seconds) say "Waiting for Data" - is there any way I can confirm the parsing is correct and maybe set a few manually?

amiracle commented 8 years ago

Make sure the time is synced between your router and your Splunk server. You can test that by doing an all-time search with one of the dashboards. Just open one of the dashboards in search and then do it all time search. 

On Sat, Feb 20, 2016 at 8:30 AM -0800, "jay" notifications@github.com wrote:

most (as in all I've found so far) of the dashboards (even if I set them to the last 30 seconds) say "Waiting for Data" - is there any way I can confirm the parsing is correct and maybe set a few manually?

— Reply to this email directly or view it on GitHub.

thecapacity commented 8 years ago

Thanks so much for the reply! I hadn't thought about time sync - let me work on that next.

It's just strange because I can see the data in the fios-specific dashboard and pull it up in search, e.g.; index=homemonitor sourcetype=fios NOT state="Connection *" state="Remote administration" (which thankfully are blocked)

But in the Blocked Traffic dashboard; http://splunk:8000/en-US/app/homemonitor/blocked_traffic I don't see anything (regardless of whether I try either fios or splunk for the source type) - and that's the same even for something like out-bound traffic (btw my syslog is >4M so I know there are plenty of events in there).

Will setup NTP and check back - thanks again!

thecapacity commented 8 years ago

NTP is setup - clocks were very close but now I'm hitting some splunk limitations - "The maximum number of real-time concurrent system-wide searches has been reached. current=9 maximum=8"

Maybe there are some initial indexing jobs that need to complete... I will give it a rest for a while.

thecapacity commented 8 years ago

I'm also grasping at a lot of google - but it looks like there is no transforms file, per: https://forum.pfsense.org/index.php?topic=94911.0

If $SPLUNK_HOME/etc/apps/homemonitor/local/transforms.conf does not exist, copy the file from $SPLUNK_HOME/etc/apps/homemonitor/default/transforms.conf to $SPLUNK_HOME/etc/apps/homemonitor/local/transforms.conf

Should I copy this in?

thecapacity commented 8 years ago

Now that splunk has caught up; when I run a search "index=homemonitor sourcetype=fios direction=out | timechart count by src_ip" over the last 30 minute window I get "0 of 8,907 events matched" -- so it feels like there is a mapping that's not occurring (i.e. I have events).

And sourcetype=stream:tcp OR sourcetype=stream:http OR sourcetype=stream:udp | stats sparkline sum(eval(bytes_in/1024/1024)) AS "Incoming Megabytes" by src | sort -"Incoming Megabytes" over "All Time" gives me" 0 of 0 events" ...

thecapacity commented 8 years ago

I'm seeing events; "index=homemonitor sourcetype=fios" but things like my public IP in the "Home Network Overview" are clearly wrong.

So it definitely feels like a transform something?

thecapacity commented 8 years ago

I reconfigured the source type in the data file input to be fios instead of syslog and now the home network overview tab looks correct (IP matches a whats my ip search) and I see 129 events and a reasonable # of devices on my network -- but 0 events (inbound or outbound) - despite generating some traffic.

thecapacity commented 8 years ago

So I noticed that transforms.conf specifies

[portlookuptable]
filename = application_protocals.csv

But that doesn't seem to exist in the applications directory structure (nor here on GitHub)?

thecapacity commented 8 years ago

Sorry to keep working through this in real-time ... but I think I suspect what the problem might be now.

I believe (suspect/hope) that the extractions are not occurring - likely (I assume) because somewhere splunk is configured to only apply them to UDP but not to the file stream. When I attempt to define extractions manually - it shows a conflict w/ the sourcetype - but I can't figure out how to make sure the extractions are applied...

I will keep digging but I suspect you have a better idea than I do!

thecapacity commented 8 years ago

Two things seem to be helping - but the dashboards still aren't working:

1) I configured a local/inputs.conf with the following

[monitor:///var/log/fios.log]
sourcetype = syslog
index = homemonitor
host = fios
disabled = false

2) I made sure sourcetype=syslog in the file's data input -- though that seemed to just change the setting here.

So now I'm getting a ton of fields in the data search (see below for a sample) but the dashboards still aren't loading.

I'm wondering if that's because all my events are under sourcetype=fios and none with sourcetype=syslog

Selected Fields

a action 15
# date_hour 1
# date_mday 1
# date_minute 10
a date_month 1
# date_second 60
a date_wday 1
# date_year 1
a date_zone 1
a dest_ip 41
a direction 12
a host 1
a index 1
a interface 1
# linecount 1
a mode 1
a nat_ip 41
a protocol 7
a punct 29
a rdns_host 22
a source 1
a sourcetype 1
a splunk_server 1
a src_ip 1
thecapacity commented 8 years ago

OK - I admit I'm at a total loss... I gave up trying to get the direct file working and configured rsyslog to forward to a port splunk could listen on (1514) - for whatever reason I couldn't get splunk to listen on 514 even starting as root.

My problem now is that splunk sees the forwarded syslog messages as coming from 127.0.0.1 instead of from the fios router...

Here's my locall/inputs.conf file

[monitor:///var/log/fios.log]
sourcetype = syslog
disabled = 1
index = homemonitor
host = fios
source = fios

[udp://514]
sourcetype = syslog
disabled = 1
connection_host = none
index = homemonitor
host = fios
source = fios

[udp://1514]
connection_host = none
host = fios
source = fios
sourcetype = syslog
index = homemonitor
disabled = 0

I think my last ditch chance is iptables NAT, such as:

Definitely feels like file parsing should be easier...

amiracle commented 8 years ago

First, the reason that you were probably unable to get your splunk server to listen to port 514 is that you need to run as root in order to listen to any port under 1028.

Let me walk you through the diagram listed below. I'm going to assume that the router is named "fios.homenetwork.com", firewall "pfsense.homenetwork.com" and switch "switch.homenetwork.com".

When the devices send their data into Splunk, it will come in over UDP port 514 as sourcetype "syslog." The transforms.conf file in the $SPLUNK_HOME/etc/apps/homemonitor/default directory will look at the data and see if it can match any the hostname to any of the source types. If you look, you'll see the line REGEX = fios or REGEX = pfsense, you'll see that the transforms.conf file will transform the source type from syslog to fios and pfsense since the hostnames of the devices contained fios and pfsense respectively. \ If your router is named something different, simply copy and paste the transforms.conf stanza for your router's model and add the hostname to the line that starts with REGEX **

[fios] REGEX = yourhostname SOURCE_KEY = MetaData:Host FORMAT = sourcetype::fios DEST_KEY = MetaData:Sourcetype

homenetwork_explained

What you've done is made it a little more complicated than it needed to be. Just set everything back to the default settings with the app and make the change to the transforms.conf file in your local directory. Once that's done, restart splunk and do index=homemonitor | stats count by sourcetype This will tell you that Splunk has recognized your fios router and the source type should only be fios.

amiracle commented 8 years ago

Also, when testing your search, make sure that you are doing the searches inside the Home Monitor app, if you do a search index=homemonitor sourcetype=fios in just regular search, you will not get field extractions. Just click on the Search button in the Home Monitor App.

thecapacity commented 8 years ago

Thanks @amiracle for the detailed reply - I wasn't able to get splunk to listen on 514 even starting as root (note it self-installed w/ a specific splunk user so I assume it's configured to drop privileges regardless).

Also, I'm actually trying to get it to run the monitoring off the rsyslog log vs. directly listening on 514 so I think that's part of the challenge (I should also mention I renamed my router's hostname to "fios" and I'm using iptables and linux on the splunk server - i.e. there's really no FW other than the fios router).

I'm going to take a stab at working the file route again, otherwise I'll try the iptables NAT.

In terms of the picture for my network it's probably simpler - I have FIOS as the router, which is really acting as the network firewall / boundary device and my splunk server is connected to that switch (I have a 2nd network w/ a FW, etc) but right now I'm just trying to grab the fios data and will work on getting the FW data 2nd.

So: [fios] ------------------------ [splunk server] | -------------------- [2nd network boundary not dealt with right now]

thecapacity commented 8 years ago

So here's what I have: 1) Confirmed no data in homemonitor 2) Enabled file watch of /var/log/fios.log -- which is built/maintained by rsyslog 3) /var/log/fios.log is defined in local/inputs as:

[monitor:///var/log/fios.log]
sourcetype = syslog
disabled = 0
index = homemonitor
host = fios

4) FIOS Router hostname set (via admin console) to "fios" 5) In the App's Search -> Data Summary; I see: * 1 host "fios" * 1 source "/var/log/fios.log" * 1 sourcetype=fios (I assume because the auto regexp / transform has already kicked in) 6) Confirmed syslog data - i.e. I am able to find in splunk a log identical to one in the rsyslog file. 7) In a sample search I see 25 extracted fields on the left hand side * Maybe, interesting this is more than the 16+2 (18) items in default/props.conf under the [fios] section 8) If I go to the "Home Overview" dashboard it asks me to select my sourcetype transform (which I do) * this seems to show a correct number of events, and likely devices on network * all other fields are 0

--- So if I had to guess it feels like the data extraction is working but maybe the IN/OUT direction extraction isn't?

thecapacity commented 8 years ago

Yes, it appears the "direction" regex is yielding mixed values (see below);

Top 10 Values Count %
connection 136 26.931%
Default 134 26.535%
Router 115 22.772%
Multicast 52 10.297%
Wireless 40 7.921%
173 5 0.99%
192 5 0.99%
packet 4 0.792%
63 2 0.396%
108 1 0.198%

Here's a sample of a syslog: Feb 21 13:55:46 2016 fios OUT: ACCEPT [57] Connection closed ( : UDP 192.168.1.13:5353 <-->x.y.z.p:5353 [224.0.0.251:5353] clink1 NAPT Outgoing FP-CAP )

So I think the next thing to try is working my way through the regexps

thecapacity commented 8 years ago

Eureka!! -- The Dashboards seem to be showing useful data now!! Here is my local/props.conf

[fios]
EXTRACT-action = (?P<action>BLOCK|ACCEPT|RATELIMIT)
EXTRACT-direction = (?P<direction>OUT|IN)
EXTRACT-interface = (?P<interface>local_dev|clink1|eth\d)
EXTRACT-procotol = (?P<protocol>UDP|TCP|IGMP|ICMP)
EXTRACT-state = ^[^\]\n]*\]\s+(?P<state>\w+\s+\w+)
EXTRACT-src_port = (?:\d\d*\d*(\.\d\d*\d*){3})(:|\s+)(?P<src_port>\d+)\s+
EXTRACT-src_ip = (?P<src_ip>(\d\d*\d*)(\.\d\d*\d*){3})
EXTRACT-dest_ip = (?:\d\d*\d*(\.\d\d*\d*){3}).+?(?P<dest_ip>\d\d*\d*(\.\d\d*\d*){3})
EXTRACT-dest_port = (?:\d\d*\d*(\.\d\d*\d*){3}).+?(?:\d\d*\d*(\.\d\d*\d*){3})(:|\s+)(?P<dest_port>\d+)\s+
EXTRACT-reason = NAPT.*(?P<reason>UNSECURED|INVALID FP-CAP)
EXTRACT-nat_ip = .*-->(?P<nat_ip>\d\d*\d*(\.\d\d*\d*){3}).*NAPT
EVAL-direction = if(match(direction,"OUT"), "out", "in")
LOOKUP-fios = action_lookup action OUTPUTNEW action2
LOOKUP-rdns = dnsLookup ip AS dest_ip OUTPUTNEW host as rdns_host

Note, I edited the default/props.conf and just renamed [fios] to [NOfios] so I didn't get conflicts but that doesn't seem like the right long term solution.

It doesn't seem perfect, for example when I click on for "Accept" (in the "Accept / Block Breakdown Bar Chart) under "Network Overview In Bound Traffic", it takes me to the "Blocked Traffic" dashboard - but at this point I'm playing w/ data and graphs!!!

Thanks for sticking w/ me, and maybe we can talk about my regexp options I found that transforms.conf could be used to make things much easier (see the ipv4 and octet examples in this); http://docs.splunk.com/Documentation/Splunk/6.2.5/admin/Transformsconf

But I didn't get around to that (yet?)!

amiracle commented 8 years ago

Issue has been resolved.