NYU-HSRN-Network-Data-Science-Group / AutoZeekWatch

An online, deployable machine learning network intrusion detection system for Zeek.
MIT License
3 stars 0 forks source link

Optimize Preprocess_json() #20

Closed diego-lopez8 closed 7 months ago

diego-lopez8 commented 7 months ago

There are a lot of optimizations we can make to the preprocess_json() function. Because we are already selecting on the fields in the makedf_samecol() function, there is less of a need to spend cycles actually dropping columns from the dataframe. We should try to clean up as much of these as possible.

The end goal should be changing everything to numpy. That is not this ticket, lets just make pandas run as fast as we can before spending enormous energy changing it to numpy.

diego-lopez8 commented 7 months ago

please also clean up the comments as the TODOs are mostly completed

zoe70416 commented 7 months ago

@diego-lopez8 Can I remove the code (from last semester) that we're not using?

diego-lopez8 commented 7 months ago

Yes!

zoe70416 commented 7 months ago

@diego-lopez8 Done. Please check the latest PR.

diego-lopez8 commented 7 months ago

completed