Production implementation - Githubissues

Stream-AD / MIDAS

Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.

Apache License 2.0

762 stars 93 forks source link

Production implementation #18

Closed SyGen899 closed 3 years ago

SyGen899 commented 3 years ago

Hi first off this is really cool, Im a novice coder and for research I would like to implement this on Netflow data in real time, the only thing is Im unsure how this can be integrated into a live environment and not on some local dataset, but maybe its a dumb question, but how should or could this be implemented?

liurui39660 commented 3 years ago

Hi, thanks for your attention. But I'm not very clear about your requirement. Can you please give more details?

SyGen899 commented 3 years ago

Thanks for your response, so if im understanding this, it needs to run constantly to perform better at detecting anomalies on a network so e.g. stream data , gets a new edge and score it, then classify, but maybe I dont understand what if the system goes down that this is running on, is there a way to store what the algorithm has learned as a backup or something? I read about the Count min sketch, is this only created in memory and released if failure happens? or does this not matter?

liurui39660 commented 3 years ago

I think you would want to periodically backup algorithm states and CMS states to a local file. The current implementation is rather a minimal version, so all things are in-memory, except for outputs. To backup, I think most variables are useful, except for a small index array that carries hashing results back from CMS.

SyGen899 commented 3 years ago

Ok, I understand a lot better now, do you know of a way that I can maybe do this with the current implementation? how should I go about storing these states?

liurui39660 commented 3 years ago

Like you can save the states to a local file (whatever format you prefer) every 10M edges. You don't need to modify the core, since those data structures only use public members. Just add a wrapper, like example/Demo.cpp, and do your job.

SyGen899 commented 3 years ago

Thanks. another question, how should a threshold be defined with this? is there an implementation that is available?

liurui39660 commented 3 years ago

If you mean a threshold to decide whether an edge is anomalous, no, the algorithm only gives raw scores. But you can use a small sample of scores as the baseline.

SyGen899 commented 3 years ago

Sorry, another question, but would this be effective in sampled NetFlow data, ie aggregate at n intervals?

liurui39660 commented 3 years ago

Sorry I can't give a clear answer. Maybe you can try once and see if there's any problem.