CoalfireFederal / NetFrenzy

Import a pcap file into Neo4j and view the network graph. Maintainer: @djent-
GNU Affero General Public License v3.0
17 stars 1 forks source link

Process all data in Python prior to uploading to Neo4j #43

Open Djent- opened 2 years ago

Djent- commented 2 years ago

This is an attempt to trade Python resource usage (RAM) for Neo4j MERGE query time. I will take a first stab at this

The goal is to maintain the same amount of data as the current ingestion technique

Djent- commented 2 years ago

git-checkout the djent--preprocess branch to play with preprocessing. It does not help. Neo4j is constant time. As Python collects more and more data, it takes longer to search and insert. From the start, it processes packets slower (~44 packets/sec) and is then down to ~15 packets/sec after 4,000 packets processed.

python3 NetFrenzy.py -c 5k.pcap --count 5000 --preprocess

Time for 5,000 packets: 1:44 on main, 4:01 on djent--preprocess

A last ditch effort may be to use dictionaries instead of lists but I will leave that as an exercise to the reader @tvldz @broosa

I recommend focusing on the other enhancement issues