spam - Githubissues

edsu commented 9 years ago

Sometimes when there is a popular hashtag botnets will flood twitter with spam that uses the hashtag. Is there a way we can filter these out somehow?

It would preferable to have a separate process that can run periodically and look for spam and either flag it or remove it from redis.

edsu commented 9 years ago

These accounts are often shutdown by Twitter. So perhaps the fact that the account no longer exists could be an indicator that it should no longer be considered a vote for a URL? Also, @remagio has noticed that some browsers can crash when there are lots of Twitter avatar image urls that 404 on an earls page.

edsu commented 9 years ago

Using the deleted accounts as a trigger may not be perfect because Twitter doesn't find all of them apparently:

screen shot 2015-03-11 at 2 23 41 pm

edsu commented 9 years ago

But here's a view where spam accounts seemed to amplify or propagate news to legitimate accounts.

screen shot 2015-03-11 at 2 38 30 pm

edsu commented 9 years ago

I'm starting to think an earls view that highlighted potential spam would be interesting. Because what is being spammed could create a picture of who is doing the spamming.

remagio commented 9 years ago

I think too that simply a couple of numbers also over default views could be quite useful. By example what I call "captweets ratio" (http://captology.stanford.edu/about/what-is-captology.html). Let me explain what I mean:

every article in earls shows: "Total of tweets sharing a URL" / Title of web page URL plus URL / List of all related Tweets plus avatar.
remaining inside browsers DOM, smaller fonts and light color, it could be useful show at the footer of all article a simple "Total of tweets sharing the URL / Total of uniq avatar (simply counting not equal avatar in the DIV-Class article).

imho captweets ratio: a simple ratio numbers, also if without explanations, could help viewers over all monitoring in realizing quickly: the shares are distributed by many (interesting/qualitative shares), only a few are sharing (discussions/promoters), spam/bots or real people acting like spam/bots

In the picture you published there are a mix of cases, hidden SMM promoters, bots, rented bots by CnC and so on. Phenomena about captweets cover a large kind of acts and missions, despite their size and quantities. I'll be glad to share thoughts, being too deeply covered at next #IJF15.

edsu commented 9 years ago

Thanks for the link about captology, I will give it a read! There are also some interesting ideas in this post by @giladlotan -- and in the comments. Maybe the following/follower ratio could be a useful indicator, as well as the account creation date?

We could also use your captweets ratio:

number of tweets mentioning a url / number of unique users that tweeted the url

remagio commented 9 years ago

I would clean up spam or links to bad website also with earls, at redis level, but without stopping or breaking earls while running. Obviously it requires to delete all "http" entry, but what about "tweet:" and "tweet:http"? An example of a small instance, but some owns more than 200k entries:

3037) "tweet:569810235491278848"
3038) "tweet:576345701413724160"
3039) "tweet:570538848373092352"
3040) "http://www.festivaldelgiornalismo.com/speaker/davide-vecchi"
3041) "tweet:575959359135559681"
3042) "https://www.swarmapp.com/andreaspinopico/checkin/5502bb49498e3acdf0e52ab9?s=95nGnWns8JQbuHWvd1zUaQFmuAQ&ref=tw"
3043) "tweet:567242174480121858"
3044) "tweet:572308163984293888"
3045) "tweet:577103131055665152"
3046) "tweet:572444319409299456"
3047) "tweet:577772143213199360"
3048) "tweet:576033940076650496"
3049) "tweet:572437678605529088"
3050) "tweets:http://www.datamediahub.it/2015/03/02/hackathon-su-media-e-giornalismo-in-italia/#axzz3TDLfxVI2"
3051) "tweet:576333419614978048"

remagio commented 9 years ago

Interesting view by @giladlotan, but working since years about it his view looks more like a beginners view in that field. Like he said, it's his first test on it despite having said he work on socials since years. At @Gilda35 we fooled ranking and socials since 2008 with dadaist's experiments. What is really interesting, he quoted it too, is how you "fool one ring... to fool them all". In my opinion there's clearly a mistake, being his first test: *you don't need at any level lots of followers in gaining high ranks or fool backend algorithms on Socials. Everyone realize it but algorithms too. The technics explained require constant work in avoiding any penalty. An example about this it's one of the most funny dadaist performance we did and analyzed: The #Twittemonument (https://translate.google.com/translate?hl=en&sl=it&tl=en&u=http%3A%2F%2Fgilda35.com%2F2012%2F04%2F11%2Fla-sezione-storie-di-twitter-esperimento-twittermonument%2F). In a way to not get penalty at all and with long term results. An example of Computational linguistics applied on all details. Including images files, backends analyze referral over Internet but images too with OCRs. "The First Twitter Monuments installation at -fakelocation- by the Major of the City -fakename-, in Italy. It ranked on every Trend Topic, leveraged authorships and ranks of all accounts involved on searches, on Twitter, Google. Blogs and Foursquare too. Getting prime news also on newspaper.

edsu commented 9 years ago

I propose we start with two features:

Add a command line utility that looks for tweets that have been deleted and removes them from the redis db. This will involve keeping track of what URLs have been found in each tweet so the appropriate count can be decremented.
Calculate the number of tweets / number of users ratio in the UI and if the number is greater than x don't display the resource. We could even add a control to the UI to adjust x.

remagio commented 9 years ago

One more thing. It starting to be a problem for IJS15 too. Since yesterday. With real users and Selenium tests. Also if not yet really spammed, but happened the same we talked and showed up above. Also if not yet getting 404 by Twitter. At every loop of earls with "each" executions, growing numbers of ".append()" & ".eval()", Android 4.4.x crash. Their chrome is still bugged when doing thousands of DOM manipulations. 11079832_10153115095707534_1883647673_n

edsu commented 9 years ago

The number of DOM updates will be greatly reduced once we have lazy loading / infinite scroll #1 Also removing tweets that have been deleted will mean those 404s for avatars will no longer appear.

edsu / earls

spam #14