Open synctext opened 7 years ago
Real world $2.44 million fraud with Amazon reviews/votes, thnx @pimveldhuisen http://www.zdnet.com/article/exclusive-inside-a-million-dollar-amazon-kindle-catfishing-scam/
Most of these were quite useful for gaining some understanding on this topic, thanks @synctext and @pimveldhuisen.
I am currently looking into Fighting peer-to-peer SPAM and decoys with object reputation, and to a lesser extent parts of P2P-Based Collaborative Spam Detection and Filtering.
It currently seems that lots of partial 'solutions' wrt adversarial search exist and have been researched, but most often they heavily depend on some form of centralisation or have another major drawback.
Also, regarding an often-used WoT based system: the gpg shortkey issue that came up recently is interesting, but I am not sure if focusing on WoTs is wise at this moment in time, seeing as these are implementation details when looking at the state of the art of adversarial search. WDYT, @synctext?
@wordempire A lot of abuse, fraud and spam examples can be found in social media and e-commerce. So that is nice stuff to write about.
but most often they heavily depend on some form of centralisation or have another major drawback.
That is a perfect storyline! Anything more for self-organising systems or P2P? Stuff like, http://www.ece.umd.edu/~goergen/docs/sec-nwatch.pdf ..
Web-of-trust mechanisms can be a minority part of your report, halve, or the majority. Whatever makes the most interesting story. A list of partial, flawed, and fantasy WoT solutions would be ideal.
Fraud with search results with direct financial gain.
lee2006understanding: develops a model that looks at the link between user behavior/awareness and pollution of a p2p network.
yoshida2009controlling: shows that index poisoning is an effective way of dealing with copyright violations when looking at the Winny network for small sets of files. This approach has the potential to disrupt the network as a whole, which might or might not be desirable for an adversary.
Determine layers of operation and archetypes for each layer. For each type, refer to drawbacks and assumptions made to make it all work.
Trust building/subversion
Choosing whether to trust an authority or group of peers can be based on a variety of existing decision making processes. For example, this would be the perfect section to refer to the abuse happening w.r.t. twitter, amazon reviews and of course the gpg short key issue. Proxy measurements for trusts could also be reviewed here (e.g., if this person shares a lot, they might be trustworthy and that kind of thing).
Index pollution/building
Indexes have to be generated for content in p2p networks, but have been polluted for some older networks, such as the Gnutella network. This section can focus on whether existing approaches can handle a subset of users putting low quality material online. Find out why it happens (user-centric), how it can be prevented and perhaps how it could be leveraged to practically block access to a undesirable/illegal resource.
Content poisoning
Older p2p protocols used a very course checksum to verify entire files. It was quite to poison a download, therefore forcing the downloader to re-download the entire file. Depending on file size, this can be quite expensive. Look at the 'evolution' of systems, with at some point referring to the BitTorrent and its piecemeal hashing that partially alleviates this issue.
OK, + add 4th or 5th section.
start .tex in https://www.google.nl/search?q=ieeee+format format
https://scholar.google.com/scholar?q=dht+poisoning https://scholar.google.com/scholar?q=link+farm https://scholar.google.com/scholar?q=kazaa+pollution Reddit HackerNews, upvote, shadow ban, etc. techniques https://scholar.google.com/scholar?q=collaborative+spam+filtering https://en.wikipedia.org/wiki/Stealth_banning Honesty among drug dealers, 90% satisfaction level with drug deals: http://dl.acm.org/citation.cfm?id=2488408 https://scholar.google.com/scholar?q=explicit+feedback+spam+filtering User feedback & moderation: http://www.sciencedirect.com/science/article/pii/S0308596108000955
the Tribler voting and spam prevention mechanism control D, Dispersy, show votecast sqlitebrowser ~/.Tribler/sqlite/tribler.sdb browse _ChannelVotes table create interesting plot
This was the user-study where the assumption that expert users can quickly assess whether something is spam is questioned: Lee, Uichin, et al. "Understanding Pollution Dynamics in P2P File Sharing." IPTPS. Vol. 6. 2006.
first warmup task: understand and plot key daya from AllChannel content discovery and voting mechanism.
Plot ideas:
I am currently still deciding on how to export all my thesis-related artifacts (no generated artifacts in repositories), but for now a preview of a plot from last week(in xkcd style so I won't accidentally include them in a report as-is):
Also quick question: Is there any more recent work than Niels Zeilemaker's thesis from 2010 regarding the search strategy used in tribler nowadays? AFAIS, search is done by first looking in the local data, and then asking your TasteBuddies for more info, but I could of course be mistaken.
After spending some time thinking about the directions we could to go with this project, I would like to expand on the concept of trust and taking into account the possiblity of trustees being compromised. Trustees in this case could be something like "friends", people with similar voting behaviour or perhaps even something that can best be described as "moderators".
Some issues that I would have to research/address/decide on:
I would also like to propose a different issue title, as "adversarial search" is usually used in the context of e.g. game related A.I. things. How about "Spam-resilient search in decentralized systems"
Problem can be split in two parts:
Survey paper possible elements:
ToDo:
Draft version: main.pdf
Draft feedback:
draft v2 main.pdf
Also @synctext , how would you like me to cite https://github.com/blockchain-lab/shared_vision_towards_programmable_economy/blob/master/tex/article.tex?
Given how many legitimate news organizations and people are routinely labelled 'troll' by their competitors (RT / AlJazeera / CNN / FOX are, even if they are biased on questions of russian/qatar/US-blue-team/US-red-team interest) - the question of 'why did this work as effectively as it did' has an underlying truth component of 'because what they were saying was just as true of a constructed narrative of social facts as the competing consensus was'. It's not the whole reason why they are successful but if we're thinking about search and mass media we should keep in mind that in addition to the mass media perception shifting going on from one player in the 'troll account' narrative, there is great (perhaps greater) mass media perception management going on from the other player as well. Some success by the other players may serve to balance out the bias of the network itself in the favour of the incumbents.
To phrase in the context of, say, Kelong Cong's paper 3.3...the 'honest region' does not include either the blue or red team and everyone associated with it, both meatspace and bot, to the extent that shared, necessary illusions involved in group membership are held.
@ichorid See this ticket of related work. Especially the 8000 fake Twitter accounts.
@synctext thanks, I'll take this stuff into account.
related #3615
Broader vision, beyond keyword search. An extensive technical analysis of the threat model in troubled regions. Aid workers are exposed to difficult challenges, see On Enforcing the Digital Immunity of a Large Humanitarian Organization.
Status update after a few years:
Keyword search within a self-organising system is a challenging unsolved problem.
Detecting and removing spam has proven to be extremely difficult. Creating a trustworthy search service, out of unreliable and possibly fraudulent resources is a challenge. A starting point is creating a web-of-trust or other feedback mechanism.
Existing work:
Web of trust for voting within Tribler: