Sotera / watchman

Watchman: An open-source social-media event-detection system
GNU General Public License v2.0
20 stars 7 forks source link

Linking related events #116

Closed drJAGartner closed 7 years ago

drJAGartner commented 7 years ago

We need to investigate & implement the best way to link events, both across and within time windows.

lukewendling commented 7 years ago

@drJAGartner can u explain 'within time windows' or should i assume we're focused only on 'across', for this issue?

lukewendling commented 7 years ago

Proposed process flow

1. get prior events
  How? events where end date is 1 ms less than current window start date
2. compare to priors
  - if (around midnight) or no matches
    create event
    send to QCR
  - else
    update existing event with new end date and append foreign keys

Match priors algorithm: (TODO) x% match on keywords

lukewendling commented 7 years ago

reminder for discussion: should we create a daily runner (maybe cron-based) to aggregate the day's events, to send to kafka.

after talking to @drJAGartner and @justinlueders, here's a new proposal:

  1. Match events as above. Only the end date is extended. We maintain the 'cleanliness' of the original event by not appending add'tl hashtags, urls, etc. since, as we noticed with aggclusters, this waters down the event => essentially we're saying that the first 30 mins of an event is the most accurate portrayal of the event.
  2. Remove the 'send to kafka' code in dr-manhattan and instead run it once a day via cron job. It's purpose is simply: send events that were created that day.