SuperCowPowers / zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
MIT License
423 stars 108 forks source link

Think about an optimized Zeek log to parquet converter #104

Closed brifordwylie closed 4 years ago

brifordwylie commented 4 years ago

The current path from Zeek log to parquet is, log -> Spark -> Parquet file. This is good/fine... .there may be some improvements/short-path that we might investigate.

1) Hand written 'block' converter 2) Simple 'wrap up' a convenience class that uses Spark internally 3) ???

log->spark->parquet notebook

brifordwylie commented 4 years ago

Created both a new examples script and a notebook