adversarial search state-of-the-art

synctext commented 7 years ago

Keyword search within a self-organising system is a challenging unsolved problem.

Detecting and removing spam has proven to be extremely difficult. Creating a trustworthy search service, out of unreliable and possibly fraudulent resources is a challenge. A starting point is creating a web-of-trust or other feedback mechanism.

Existing work:

Web of trust for voting within Tribler:

www.ds.ewi.tudelft.nl/fileadmin/pds/reports/2013/PDS-2013-002.pdf
Multichain accounting
BM25 discussed in this thesis

synctext commented 7 years ago

Real world $2.44 million fraud with Amazon reviews/votes, thnx @pimveldhuisen http://www.zdnet.com/article/exclusive-inside-a-million-dollar-amazon-kindle-catfishing-scam/

jellelicht commented 7 years ago

Most of these were quite useful for gaining some understanding on this topic, thanks @synctext and @pimveldhuisen.

I am currently looking into Fighting peer-to-peer SPAM and decoys with object reputation, and to a lesser extent parts of P2P-Based Collaborative Spam Detection and Filtering.

It currently seems that lots of partial 'solutions' wrt adversarial search exist and have been researched, but most often they heavily depend on some form of centralisation or have another major drawback.

jellelicht commented 7 years ago

Also, regarding an often-used WoT based system: the gpg shortkey issue that came up recently is interesting, but I am not sure if focusing on WoTs is wise at this moment in time, seeing as these are implementation details when looking at the state of the art of adversarial search. WDYT, @synctext?

synctext commented 7 years ago

@wordempire A lot of abuse, fraud and spam examples can be found in social media and e-commerce. So that is nice stuff to write about.

but most often they heavily depend on some form of centralisation or have another major drawback.

That is a perfect storyline! Anything more for self-organising systems or P2P? Stuff like, http://www.ece.umd.edu/~goergen/docs/sec-nwatch.pdf ..

Web-of-trust mechanisms can be a minority part of your report, halve, or the majority. Whatever makes the most interesting story. A list of partial, flawed, and fantasy WoT solutions would be ideal.

synctext commented 7 years ago

https://github.com/pimotte/msc-thesis

synctext commented 7 years ago

Fraud with search results with direct financial gain.

jellelicht commented 7 years ago

lee2006understanding: develops a model that looks at the link between user behavior/awareness and pollution of a p2p network.

jellelicht commented 7 years ago

yoshida2009controlling: shows that index poisoning is an effective way of dealing with copyright violations when looking at the Winny network for small sets of files. This approach has the potential to disrupt the network as a whole, which might or might not be desirable for an adversary.

jellelicht commented 7 years ago

Determine layers of operation and archetypes for each layer. For each type, refer to drawbacks and assumptions made to make it all work.
1. Trust building/subversion
  
  Choosing whether to trust an authority or group of peers can be based on a variety of existing decision making processes. For example, this would be the perfect section to refer to the abuse happening w.r.t. twitter, amazon reviews and of course the gpg short key issue. Proxy measurements for trusts could also be reviewed here (e.g., if this person shares a lot, they might be trustworthy and that kind of thing).
2. Index pollution/building
  
  Indexes have to be generated for content in p2p networks, but have been polluted for some older networks, such as the Gnutella network. This section can focus on whether existing approaches can handle a subset of users putting low quality material online. Find out why it happens (user-centric), how it can be prevented and perhaps how it could be leveraged to practically block access to a undesirable/illegal resource.
3. Content poisoning
  
  Older p2p protocols used a very course checksum to verify entire files. It was quite to poison a download, therefore forcing the downloader to re-download the entire file. Depending on file size, this can be quite expensive. Look at the 'evolution' of systems, with at some point referring to the BitTorrent and its piecemeal hashing that partially alleviates this issue.

synctext commented 7 years ago

OK, + add 4th or 5th section.

start .tex in https://www.google.nl/search?q=ieeee+format format

https://scholar.google.com/scholar?q=dht+poisoning https://scholar.google.com/scholar?q=link+farm https://scholar.google.com/scholar?q=kazaa+pollution Reddit HackerNews, upvote, shadow ban, etc. techniques https://scholar.google.com/scholar?q=collaborative+spam+filtering https://en.wikipedia.org/wiki/Stealth_banning Honesty among drug dealers, 90% satisfaction level with drug deals: http://dl.acm.org/citation.cfm?id=2488408 https://scholar.google.com/scholar?q=explicit+feedback+spam+filtering User feedback & moderation: http://www.sciencedirect.com/science/article/pii/S0308596108000955

the Tribler voting and spam prevention mechanism control D, Dispersy, show votecast sqlitebrowser ~/.Tribler/sqlite/tribler.sdb browse _ChannelVotes table create interesting plot

jellelicht commented 7 years ago

This was the user-study where the assumption that expert users can quickly assess whether something is spam is questioned: Lee, Uichin, et al. "Understanding Pollution Dynamics in P2P File Sharing." IPTPS. Vol. 6. 2006.

synctext commented 7 years ago

first warmup task: understand and plot key daya from AllChannel content discovery and voting mechanism.

Plot ideas:

of currently roughly 2710 channels in Tribler: show number of votes for each channel. X-axis consists of channels, sorted by number of votes.
of all 250.000 cast votes, try to determine the time of voting. Plot vote activity in time.
Examples of plotting usage

jellelicht commented 7 years ago

I am currently still deciding on how to export all my thesis-related artifacts (no generated artifacts in repositories), but for now a preview of a plot from last week(in xkcd style so I won't accidentally include them in a report as-is): bars

Also quick question: Is there any more recent work than Niels Zeilemaker's thesis from 2010 regarding the search strategy used in tribler nowadays? AFAIS, search is done by first looking in the local data, and then asking your TasteBuddies for more info, but I could of course be mistaken.

After spending some time thinking about the directions we could to go with this project, I would like to expand on the concept of trust and taking into account the possiblity of trustees being compromised. Trustees in this case could be something like "friends", people with similar voting behaviour or perhaps even something that can best be described as "moderators".

Some issues that I would have to research/address/decide on:

How to define and bootstrap "trust" (Perhaps I could look into using something similar to TrustChain, or piggyback on the existing records embedded in TrustChain)
How to handle the combination of content-based reputation (votes) and user-based reputation (weighing or even ignoring some votes)
If I want to base important decisions on someones votes, I somehow need to make sure Sybil attacks don't influence the decisions too much/at all.

jellelicht commented 7 years ago

I would also like to propose a different issue title, as "adversarial search" is usually used in the context of e.g. game related A.I. things. How about "Spam-resilient search in decentralized systems"

jellelicht commented 7 years ago

img_20170126_154624

jellelicht commented 7 years ago

Problem can be split in two parts:

Prevent spam and spam-related meta-information from entering the network
Prevent spam and spam-related meta-information from hindering the proper usage of the network

synctext commented 7 years ago

Survey paper possible elements:

discuss proposed stuff + pictures/architecture in the past (Credence, socery by Yale,..)
deeply explain AllChannel
Measurements: understand and plot key data from AllChannel content discovery and voting mechanism.

synctext commented 6 years ago

ToDo:

start a 2-column IEEE format .tex paper
crawl and process allchannel votes, create a non-xkcd style picture
write down what you did and how it work. {Tribler created an operational system, we crawled this and analysed the distribution of magical votes, etc etc.}
say something intelligent about AllChannels. {1980 voting spam, 98% 2%, most active outliers, etc.}
read 15 years papers
Write a Problem Description: adversarial search in decentralized systems
Discuss 30+ related work www.seas.upenn.edu/~cse400/CSE400_2015_2016/reports/report_28.pdf

jellelicht commented 6 years ago

Draft version: main.pdf

synctext commented 6 years ago

recent report + discussion: https://news.ycombinator.com/item?id=15055522
Active conference series "rebooting the web-of-trust"
idealist vision: https://idcubed.org/home_page_feature/towards-post-industrial-networked-democracy-decentralized-data-commons-exchange-tokens-trust-value/
motivational storyline also applies here: https://github.com/Tribler/tribler/issues/2783#issuecomment-324253362
- modification: the algorithm to divide the world in good and evil is hard (or in technical terms honest and dishonest peers)
read: Trust building for your Solutions chapter. https://github.com/blockchain-lab/shared_vision_towards_programmable_economy/blob/master/tex/article.tex

Draft feedback:

improve structure
start sections with clean main point.
Why, What, How
Scientific
Sybil :

jellelicht commented 6 years ago

draft v2 main.pdf

jellelicht commented 6 years ago

Also @synctext , how would you like me to cite https://github.com/blockchain-lab/shared_vision_towards_programmable_economy/blob/master/tex/article.tex?

synctext commented 6 years ago

A system without these drawbacks that consistently classifies peers as honest or dishonest does not currently exist. Any web page, tweet, blog post, wikipedia edits, news article can be fake or real.
Define IR
solutions for dishonesty
no 3.B
More then [25]
Experiment with real distributed search code. Example We spend exactly 1 hour which each of these software packages and describe their maturity level. Include screenshot of each.
- freenet
- gnunet
- YACY
- Tribler
- etc

jellelicht commented 6 years ago

main.pdf

synctext commented 6 years ago

Are social media trolls in scope New Yorker and BBC plus Bloomberg? The official report"Assessing Russian Activities and Intentions in Recent US Elections", drafted and coordinated among The Central Intelligence Agency (CIA), The Federal Bureau of Investigation (FBI), and The National Security Agency (NSA).
Read The Sybil Attack - Theory and Practice by Kelong.

kelong_sybil_overview

jellelicht commented 6 years ago

Influence of mass media on perception (shifting the 'normal' distribution e.g. Sybil region) effectively, why did this work as effective as it did?
What are the real-world costs associated with creating many 'troll accounts'?

ghost commented 6 years ago

Given how many legitimate news organizations and people are routinely labelled 'troll' by their competitors (RT / AlJazeera / CNN / FOX are, even if they are biased on questions of russian/qatar/US-blue-team/US-red-team interest) - the question of 'why did this work as effectively as it did' has an underlying truth component of 'because what they were saying was just as true of a constructed narrative of social facts as the competing consensus was'. It's not the whole reason why they are successful but if we're thinking about search and mass media we should keep in mind that in addition to the mass media perception shifting going on from one player in the 'troll account' narrative, there is great (perhaps greater) mass media perception management going on from the other player as well. Some success by the other players may serve to balance out the bias of the network itself in the favour of the incumbents.

To phrase in the context of, say, Kelong Cong's paper 3.3...the 'honest region' does not include either the blue or red team and everyone associated with it, both meatspace and bot, to the extent that shared, necessary illusions involved in group membership are held.

synctext commented 6 years ago

@ichorid See this ticket of related work. Especially the 8000 fake Twitter accounts.

ichorid commented 6 years ago

@synctext thanks, I'll take this stuff into account.

ichorid commented 6 years ago

related #3615

synctext commented 6 years ago

Broader vision, beyond keyword search. An extensive technical analysis of the threat model in troubled regions. Aid workers are exposed to difficult challenges, see On Enforcing the Digital Immunity of a Large Humanitarian Organization.

synctext commented 3 years ago

Status update after a few years:

we avoid solving the general keyword search problem
items are bundled into channels, with 1 publisher
each channel has a number of votes by voters
each voter has a sybil-attack estimator through trustchain
count weighted votes to estimate channel + item relevance ranking.

Tribler / tribler

adversarial search state-of-the-art #2547