Stream-AD / MIDAS

Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.
Apache License 2.0
762 stars 93 forks source link

ground truth labels for TwitterworldCup2014 dataset #13

Closed victordaniel closed 4 years ago

victordaniel commented 4 years ago

I want to run MIDAS on the TwitterWorldCup2014 dataset, but in the given dataset, the ground truth does not include the label as 0 or 1, instead, it shows the following

1 | Arena de Sao Paulo, Sao Paulo, Brazil | Brazil, Croatia | Marcelo | Own Goal | 6-12-2014 20:11:00 | High importance events.

please suggest, how to generate labels as 0 or 1 i.e anomalous or not. Have you already prepared ground truth labels for this, if yes could you please share that?

Here in this dataset , there are three events such as

  1. goal 2.penalty 3, Injury. what could be the anomaly in these events.

Thanks.

bhatiasiddharth commented 4 years ago

For Twitter datasets, we only have a few ground truth events so we cannot directly assign a 0/1 label to all the tweets. However, when anomaly detection algorithms are run on such datasets, we can observe that peaks (tweets aggregated per hour/day/week) correspond well to ground truth events. If you still need to generate labels, you can try and label tweets containing goal/penalty/injury kind of events as anomalous (label = 1)

victordaniel commented 4 years ago

Thanks!

" label tweets containing goal/penalty/injury kind of events as anomalous (label = 1)"

does it mean that whichever tweets that contain events should be considered as anomalous ,and those tweets which do not contain events as nonanomalous?

do you suggest any other twitter or Facebook dataset for edge stream anomaly detection?

bhatiasiddharth commented 4 years ago

Yes, I meant that. Sorry, I am not aware of any other datasets from Twitter/Facebook.