SuperCowPowers / zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
MIT License
423 stars 107 forks source link

Can we make JSON a first class citizen? #142

Open hilt86 opened 2 years ago

hilt86 commented 2 years ago

Thanks again for making Zat - I'm surprised it isn't being used by more folks!

In reference to https://github.com/SuperCowPowers/zat/blob/7f0de8bb052e8c84ab9bd00f195514d957eac9ec/zat/json_log_to_dataframe.py which states :

"""JSONLogToDataFrame: Converts a Zeek JSON log to a Pandas DataFrame
    Notes:
        Unlike the regular Zeek logs, when you dump the data to JSON you lose
        all the type information. This means we have to guess/infer a lot
        of the types, we HIGHLY recommend that you use the standard Zeek output
        log format as it will result in both faster and better dataframes.
    Todo:
        1. Have a more formal column mapping
        2. Convert Categorial columns
"""

What needs to be done so that we can have JSON as a first class citizen in Zat? Heaps of other tools rely on Zeek logs being json (Elastic Agent integration, Rita, etc) so it is a bummer that in order to use Zat we need to use ascii logging..

brifordwylie commented 2 years ago

Thanks for your interest in improving the JSON reader. If the ZAT code is being used in a commercial setting and you'd like to get stuff done quickly and per specification you can always touch base with the SCP folks at https://www.supercowpowers.com/. Otherwise I agree with your suggestion that the JSON reader needs to be improved and we'll put this on the open source queue. I think the S3 bucket read suggestion is also top on the list 👍

hilt86 commented 2 years ago

ok I will cancel my sponsorship here and get in touch with you directly...