CatalystCode / project-fortis-spark

A repository for all spark jobs running on fortis
MIT License
9 stars 4 forks source link

Filter retweets #172

Closed erikschlegel closed 6 years ago

erikschlegel commented 7 years ago

We should remove any events marked as a retweet through the Twitter Analyzer

Currently, we're processing messages marked as retweets, which is making up a large number of the Fortis events. This creates duplicative events, and floods our spark streaming job with a high volume of content.

We should enhance the Twitter Streaming Factory and filter out retweets, similar to the exiting profile filter that's in place.

c-w commented 6 years ago

Looks like this is now happening in TwitterStreamFactory