SuperCowPowers / zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
MIT License
424 stars 108 forks source link

can't read log file with bat #50

Closed bdsmith48 closed 6 years ago

bdsmith48 commented 6 years ago

My code just hangs when trying to use bat on a log file

from bat import bro_log_reader

reader = bro_log_reader.BroLogReader('ssh.log') for row in reader.readrows(): print(row)

this code never completes or prints out anything, it is having problems with the for loop

swedishmike commented 6 years ago

Hi @bdsmith48,

I just tried this and it worked just fine here - only thing is that print doesn't make it as nice to read as pprint but that's just a minor thing. A couple of questions to see if I can help you further.

I've found that depending on the host (RAM, CPU etc) and size of logfile it can take some time to read in the file. If this is your own file - what size is the file/how many lines does it contain?

Have you tried any other file that are bundled with Bat in the data subdirectory?

Cheers, Mike

bdsmith48 commented 6 years ago

I've tried it on windows with python 2.7.6 and on a Mac but I can't currently check the version I was using on the Mac

swedishmike commented 6 years ago

I don't think I got 2.7.6 installed anywhere at the moment so can't replicate right now.

If you just set it to read in the log, does it ever finish - if you run it in the interactive Python shell for example?

Have you tried other logs than the ssh.log?

brifordwylie commented 6 years ago

@bdsmith48 @swedishmike My guess here is that ssh.log is an empty file or something. @bdsmith48 please try this with the files included in this repository under data/

bdsmith48 commented 6 years ago

is there a reason why this wouldn't work with other bro log files? I know my files aren't empty and they aren't extremely long. It turns out that it did work for your log files though.

swedishmike commented 6 years ago

I've used it with log files from quite a number of various installations - as well as some publicly available ones where I don't even know which versions of Bro they were generated so it should work with yours too.

Three quick questions...

Cheers, Mike

P.S Just another, really silly, question - you are running Bro with the default, tab separated, logs and not Json logs?

brifordwylie commented 6 years ago

@swedishmike good point about the json logs. @bdsmith48 if you have json logs they are currently not supported but on our todo list. See https://github.com/Kitware/bat/issues/40

swedishmike commented 6 years ago

@bdsmith48 Are you still having issues with this or did you get it figured out?

seckindinc commented 6 years ago

I mailed same error to brian.wylie@kitware.com but it was responded that there is no such mail. I saved a 2k row file as bro. Then i am tyring to load it to data frame with command below. My problem is nothing happens in 1.5 hour and cpu is always %100. What can cause this? Any mismatch between Bro versions? What version of Bro is required for Bat? I am using python 3.6. btw bro_df = LogToDataFrame('/home/seckindinc/Desktop/Projects/Bro/bro')

swedishmike commented 6 years ago

@seckindinc When you say that you 'saved it as bro' what do you mean?

Would it be possible for you to share this file so that I or @brifordwylie could give it a go in our environments?

I run Python 3.6 as well and have used logfiles from Bro 2.5.x without any problems.

seckindinc commented 6 years ago

I tried to mean that i saved a small portion of raw log file under the name of "bro". I can't share log with you because of confidentiality. Can you share your Bro version or parser?

swedishmike commented 6 years ago

Did you leave in the headers etc in the file?

Are your logs in tab separated format or JSON?

Which log file are you trying to parse at the moment? Conn, ssl, http etc?

Also, can you parse the example files that comes with Bat?

I'm running Bro v 2.5.1 and 2.5.2 on various machines at the moment.

seckindinc commented 6 years ago

There is no head in file. It is tab separated. I have multiple types of Bro logs. I am assuming that Bro automatically parse this? Didn't try yet. I will soon.

seckindinc commented 6 years ago

I have multiple types of Bro logs. I am assuming that Bat* automatically parse this?

seckindinc commented 6 years ago

I have done with your examples. I think i need to check my log files for format issues.

swedishmike commented 6 years ago

I think I might have replicated your issue. I removed the headers from one of the test files and now it doesn't load properly.

I'm in the middle of some Christmas celebrations here so can't fully verify in the code right now but my guess is that the headers are used to verify what file it is that's being opened and what fields it contains. I'm sure @brifordwylie can confirm whether or not this is true once he sees these messages and have a minute to spare over the holidays.

If you want to test you can take the headers from your original log file and add them to your exported one.

Just to confirm - what I call headers are the following lines, this example is from the dns.log file:

#separator \x09
#set_separator  ,
#empty_field    (empty)
#unset_field    -
#path   dns
#open   2014-04-03-10-08-27
#fields ts  uid id.orig_h   id.orig_p   id.resp_h   id.resp_p   proto   trans_id    query   qclass  qclass_name qtype   qtype_name  rcode   rcode_name  AA  TC  RD  RA  Z   answers TTLs    rejected
#types  time    string  addr    port    addr    port    enum    count   string  count   string  count   string  count   string  bool    bool    bool    bool    count   vector[string]  vector[interval]    bool

Please let me know if that makes any difference.

seckindinc commented 6 years ago

When i remove every detail except data, it didn't work for me either. I think bat requires column names and product type.

swedishmike commented 6 years ago

So if you leave the lines starting with # from the top of the file it works?

As I said in the previous comment - I think this is what's used to ascertain what file it is as well as get the field names and so on.

seckindinc commented 6 years ago

Thank you so much for your help. This package works great if we give detailed info about the log.

swedishmike commented 6 years ago

@seckindinc No worries at all - I'm glad I could help you.

@bdsmith48 - Just to check - could this be the solution to your issue too?

brifordwylie commented 6 years ago

@swedishmike @seckindinc Yes, the reader reads in the Bro Headers. All bro versions should be putting out headers on the files. If you cut/paste some of the rows into another file you'll need to include the headers as well. I'm going to close this ticket, If @bdsmith48 wants to reopen we will.