Open philippp opened 2 years ago
On behalf of Google Ad Traffic Quality
An Invalid Traffic taxonomy provides one framework for classifying IVT abuse vectors. Below is a sample of some abuse vectors for IVT / Ad Fraud that Google’s Ad Traffic Quality team has identified as impacting ads monetization on the web. IVT detection for these abuse vectors is dynamic due to the space being highly adversarial, with bad actors that continuously develop new methods to bypass detection.
IVT detection is commonly segmented across three areas of interest: 1) Non-Human Traffic, 2) Incentivized Human Traffic, and 3) Misrepresented or Manipulated Human Traffic. By looking at these three areas we can determine if relevant ad events should be considered IVT, or if they are organic (genuine human interactions resulting from genuine interest).
The list below highlights a subset of common categories of IVT. It should be noted that the list is not exhaustive of all abuse vectors, and that the vectors listed in each category are not necessarily mutually exclusive from each other.
Typically based on malware that has infected a user’s device without their consent (computer, mobile device, or other system), rendering the device a “bot” that combined with other bots comprises a “botnet.” Botnets drive automated traffic that tries to mimic human behavior, often by opening hidden browser windows on infected devices. Some malware emulates user clicks using random or predetermined click patterns.
Non-human invalid traffic often comes from virtual devices, potentially including emulated mobile devices, virtual machines running in data centers, etc. While not all ads interactions from emulated or simulated devices are non-human, it is Google’s practice, as well as an industry standard, to deem this type of ads traffic as IVT when detected.
IVT attacks that are not distributed via botnets, but run in an automated fashion, typically include:
Publishers may attempt to drive revenue by interacting with ads on their own websites or apps.
Users that engage with ads with a lack of intent or interest (e.g. browser extensions that provide financial rewards for clicks). Ad interactions where users are offered a direct or indirect monetary incentive (in the form of currency or an equivalent) for interacting with ads without disclosure to advertisers. This does not refer to rewarded traffic, where advertisers are aware that publishers offer users non-currency (or equivalent) rewards (redeemable only within the app/site/game) in exchange for interacting with ads.
Bad actors may try to swap the reported country of their users, to fetch higher ad revenues (eg. swap the IP address or country of emerging markets users for developed markets with higher average ad prices). Users may state they are in a country that does not match their actual location as a tactic to evade anti-abuse defenses or to be perceived as less risky by associating with a country with lower abuse rates.
Bad actors may misrepresent the ad inventory they are monetizing, in an attempt to fetch higher revenue in ads auctions (e.g., a low quality site or app may claim to be a high-value, known name brand).
The use of deceptive elements (e.g., buttons or short link redirections) or interfaces on a web page or app to trick users into clicking on an ad (that they didn’t expect to click on).
Ads that are impossible to see under any normal circumstances. They are ads tucked under iframes, hidden behind content, hidden behind other ads (aka “ad stacking”), inside invisible HTML containers, or ads that are displayed but too small to be seen (aka “pixel stuffing”).
When users inadvertently click on an ad, even though they didn’t mean to do so. Publishers are not permitted to create interfaces that may lead users to accidentally click on ads. This includes implementing the ads in a way that they might be mistaken for other site content, such as a menu, navigation, or download links.
Users may try to disguise their activity by removing cookies or using other tactics to hide their high ads activity in order to appear as multiple users interacting with the same ads. IVT defenses should be able to determine when an actor is attempting to appear as multiple users from the same device/browser instance.
On behalf of Google Ad Traffic Quality
Invalid Traffic (IVT) and Ad Fraud detection requires additional capabilities to protect ad systems against bad actors. As seen in the common use cases observed by Google’s Ad Traffic Quality team, there is a diverse set of abuse vectors. Many of these vectors are continuously emerging and evolving to evade detection. Below is a non-exhaustive list of capabilities to assist in IVT and Ad Fraud detection that enable defenses against these vectors. It is important to note that no single mechanism or tool is a comprehensive solution in defending against IVT, and that each mechanism is part of a layered defense strategy, as bad actors develop new tactics and techniques to generate IVT and commit Ad Fraud.
Invalid traffic defenses should be able to determine the realness and human qualities of ad interactions.
Invalid traffic detection requires the ability to separate “normal” or “organic” ads interactions from invalid or non-organic interactions. Today this is accomplished in part by evaluating interaction signal anomalies, as well as conversion metrics.
The detection of threats that are generated by multiple actors working in a unified, synchronized, and coordinated manner. Such attacks include coordinated clicking from a ring of publishers who all agree to click on each others' ads (eg. co-clicking, bad actor rings).
Determine instances where the user's actual location is not aligned with the stated location, which can distort publisher metrics related to CPC and CPM. Users may disguise their location to be in a more trusted country to bypass anti-abuse defenses. Determine instances where the page being visited claims to be another, so as to manipulate ad prices.
IVT defenses and attestation signals (including a browser or platform’s Privacy Preserving APIs for ads targeting and measurement) should not be easily reverse engineered or manipulated. This need has been addressed historically through obfuscation and non-public disclosure of IVT signals and defenses. Additionally these signals must be able to be regularly updated as defense needs change, and their effectiveness should be measurable over time. Of note, bad actors are highly motivated to evade detection and circumvent device, environment, and event attestation solutions.
Concern: If we try to enumerate all the varieties of devious behavior I worry the list will be unbounded. It's much simpler to explain the normal behavior that we need to be able to recognize clearly: Counterabuse needs:
With this basic information, subsequent behavioral analysis of system usage can recognize incentivization, coordination and related behavioral abuse. Counterabuse will also use account-level intellligence, but account infrastructure seems like something we take for granted and so perhaps out-of-scope here.
I share the intuition that there may be a low number of canonical assertions (e.g. user is human, platform security model is unbroken, geo-location is correct) that will cover a wide array of use cases. The ambition of enumerating the use cases and requirements/capabilities for each is to connect the real-world motivations (e.g. preventing account takeover, social media manipulation, DoS) with the capabilities/assertions they require.
I agree with your conclusion, but we have align on the priority of these requirements across all stakeholders, including those who are newer to the trust & safety / anti-fraud / security domains. I hope that once we have capabilities from a variety of use case owners, we can consolidate and up-level them as it makes sense (potentially arriving at a short list similar to yours).
However, once we have the mapping of use cases to capabilities, it is easier to go back and say "if we are unable to detect an attacker who is 'Appearing as Multiple Users from the Same Device / Browser,' we open the door to social media manipulation, denial of service attacks, ad fraud, etc" and include the relevant stakeholders/use-case-havers when discussing the validity and urgency of this requirement.
I'm open to more efficient ways of establish the criticality and completeness of these requirements, if you have suggestions.
Per Philipp's request, pasting the functional and non-functional requirements detailed for the IAB Tech Lab Authenticated Devices standard here:
To help Google ensure that we have a complete inventory or capabilities and understand the anti-fraud ecosystem's priorities, we have launched a capabilities gathering survey. This survey can be accessed at https://google.qualtrics.com/jfe/form/SV_5p3y9l2N1LYQYpU. We invite you all to participate in this survey as well.
A few FAQs on the survey: Who can take it? Any organization with anti-fraud needs in any region. This organization can be of any size and in any industry vertical (including payments, eCommerce, social media etc.). Till when can I take the survey? The survey will be active till January 31, 2023. Who will see my information? Consolidated results (with no organization name) will be published in this GitHub after the survey closes. Respondents have the option to include their organization name if they would like their individual response published in GitHub. Respondents also have the option to include their name and email address if they would be willing to discuss their response in detail with Google 1:1.
We will also ask for this survey to be added to the agenda at one of the next CG meetings.
The results of the capabilities gathering survey have been posted below, along with individual responses from four organizations that wanted their results to be published as well:
Capabilities Gathering Survey Results (Publish).pdf dstillery (Publish in AFCG) (1) (1) (1).pdf F5 (Publish in AFCG) (1).pdf IDWall (Publish in AFCG) (1).pdf Socure (Publish in AFCG) (1).pdf
For the sake of efficiency, it might be prudent to vote on the use cases document before we file individual "capability" issues / requests against it. That said, it would behoove us - especially folks working on defensive teams that have relevant use cases - to start mapping out what capabilities we require in order to address our use cases.
To that effect, I invite folks to use this issue as a scratch pad, describing the capabilities their use cases require. We can then factor out the distinct capabilities (I assume there will be overlap) and file them as discrete issues for targeted discussion.