antifraudcg / proposals

Proposals for the Anti-Fraud Community Group.
24 stars 5 forks source link

Attestation of device lifetime #15

Open pamellaprevedel-hpm opened 1 year ago

pamellaprevedel-hpm commented 1 year ago

We are Allowme, a business unit from Tempest Security Intelligence, a cybersecurity company from Brazil, Latam, with more than 22 years in operation. Allowme's mission is to help companies protect the digital identities of their legitimate customers through a complete fraud prevention platform.

Context and threat Automation is one of the main requirements for large scale attacks and high profit for attackers, therefore, it has become a priority from a malicious actor's point of view.

When doing a massive attack, fraudsters usually use navigation automation tools without a graphical interface, or Headless Browser (https://en.wikipedia.org/wiki/Headless_browser), usually using versions of Chrome Webdriver (https ://en.wikipedia.org/wiki/Selenium_(software)#Selenium_WebDriver).

However, a common characteristic in attacks of this nature is that the attacker essentially needs to create many instances of browsers to execute the attack and, when this is done, in general the created browser has very unique characteristics, as if they were installations performed at that moment.

Proposal Being able to accurately and safely attest to improper manipulations, the lifetime of that User Agent instance, from its initialization to the present moment, can be of extreme importance and value in the detection of automated threats, both on the web and on mobile devices.

On web browsers The combination of different signals could be used to estimate the lifetime of a device running, for example: lifetime of cookies, time of plugin installation, time since last update, lifetime of an associated profile to the browser, etc.

On mobile devices For mobile devices, knowing the OS lifetime can be even more accurate, as the hardware can indicate this, in addition to the connection between the device and the manufacturer's application store (Google Play or Apple Store).

On Android, for example, we could use some relevant information, such as: Date of acquisition of an App on Google Play Date of installation/re-installation of the App on Google Play

However, an important decision to be made is whether to recompile Apps after the first installation, as this could compromise the lifetime of a given App.

Privacy implications and safeguards There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

However, this data could be used as an additional signal to re-identify users if combined with browsing history and other behavioral information.

Safeguard #1 The API could only return if the lifetime is longer than a specific time period, for example: If the lifetime is longer than 1 day If the lifetime is longer than 1 week If the lifetime is more than 1 month If the lifetime is more than 3 months If the lifetime is more than 1 year

Thus, it would be difficult to use this data to identify a person, even when combined with other user behavior data.

SpaceGnome commented 1 year ago

Definitely like the idea - would be interesting if we could merge this and https://github.com/antifraudcg/proposals/issues/9 ?

philippp commented 1 year ago

Attesting the lifetime of the UA instance, OS, or any other part of the system either requires attesting that the UA instance / OS / etc is indeed what it presents itself as being. https://github.com/antifraudcg/proposals/issues/9 proposes an attestation of the time-since-cookie-reset, which becomes meaningful only on systems that can attest everything up to the integrity of the cookie jar. (The intention of the bug is to not leave an attack opportunity via UI in that end state).

Is this in line with how you are thinking about it?

bmayd commented 1 year ago

I like the general concept of having attestations for UA ages as well as the suggestion that the API be limited to returning true of false when asked whether the lifetime is greater than a specified interval.

It seems that attesting to the age of a thing is generally useful and that it would be worthwhile developing an abstract pattern that could be broadly applied.

I think we might also want to attest to the fact that something is used, how recently and how often, though the latter may be more difficult to provide. An app instance installed a month ago and used for the first time an hour ago is meaningfully different from an app instance installed a month ago and used daily. Alternatively or additionally, we might provide cumulative utilization information e.g., used for longer than an hour since install or used on average more than 10 minutes/week.

I think we want to be careful about the question of privacy:

Privacy implications and safeguards There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

Although no PII is used, there is meaningful potential for this API to provide information that could contribute to the development of user or subgroup profiles. For example, a list of app installs by date could be combined with responses from this API to identify the subset of users a device might belong to and repeating the process with different apps would progressively narrow the list of potential devices. Given that, we'd probably want to budget access to this sort of API.

dvorak42 commented 1 year ago

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

AramZS commented 1 year ago

I want to note my agreement with @bmayd here:

There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

Lifetime data as described here would instantly become a major vector for fingerprinting.

Also, I'm very unclear to what extent this could prevent the signal itself from being fraudulently set by artificial browsers.

dvorak42 commented 1 year ago

From the CG meeting, the question of how a device attests that they're not lying about their lifetime came up, though there was a question if there are certain use cases that are okay relying on the less clear lifetime guarantees. One next step there might be to pick out specific cases with the weaker lifetime model and see if those are of interest pursuing, or if there's an attestable technique to manage device lifetime in a privacy-preserving way?

michaelficarra commented 1 year ago

If we're concerned about the sensitivity of the lifetime, a possible solution would be to use a zero-knowledge proof like Yao's Millionaires' problem. Of course, this would still depend on device integrity to ensure that the client isn't lying.

SamuelSchlesinger commented 1 month ago

The zero-knowledge proof idea is nice @michaelficarra. Here's a sketch of an approach that I think works: when the server first sees a client, the client requests a signed copy of the current time. When the server wants to verify that client age is older than a certain date, the client sends them a zero-knowledge proof that there exists a time which was 1. signed by the server, and 2. older than the given cutoff.

This approach is not adversarially robust, as now a bad actor can simply take the signed time and send it around to all of their different bots. So here's an attempt to mitigate that, using the anonymous rate limiting scheme from the zkCreds paper : instead of a signed time, we get a blind signature of a random nonce X with the current time t as the public metadata. When the server wants to verify that the client age is older than a specific cutoff, they now send a proof as well as an output Y from a cryptographically secure PRF. The proof would now say that there exists such a signed X with public metadata t such that PRF(X || epoch || counter) = Y, counter < RATE_LIMIT, and t < cutoff, where epoch is based on the current time interval like the given week or month or whatever and well known to the server and client. This way there are only RATE_LIMIT valid Y outputs per epoch and the signed time in the browser can't just be used by an infinite number of bots in any given time window.

SamuelSchlesinger commented 3 weeks ago

@AramZS, I think my comment addresses your concern about fingerprinting by only allowing a single bit based on whether the date is before a certain threshold. The one major difference between this issue's request and what I'd endorse is that device age seems far too sticky to be consistent with the privacy goals of the web platform. Instead, I'd like to propose a user-wipeable "profile age" which would be cross-site but can be wiped by the user along with other cross-site state.

@dvorak42 I think it would make sense to discuss this in the next week's AFCG meeting -- do we have an open slot on the agenda?

akakou commented 1 week ago

@SamuelSchlesinger

Thank you for your very interesting presentation. I would like to share some ideas that may help this proposal.

My idea is to reconstruct your idea, based on the Scrappy discussed in #21. Although Scrappy is very similar to zk-cred, it has some beneficial points as follows:

  1. Scrappy is based on standardized cryptographic protocols (i.e., DAA, EPID).
  2. The latency of Scrappy is likely shorter than zk-cred because it does not use zk-SNARKs.*
  3. The key for Scrappy can be stored in a TPM (i.e., secure hardware chip) since Scrappy is compatible with TPM.

*If Scrappy is running on the computer directly and not on TPM.

akakou commented 1 week ago

By the way, the related work section in Scrappy's paper might help answer the question in your slide (i.e., Alternatives to rate limiting).

image

Paper: https://www.ndss-symposium.org/ndss-paper/scrappy-secure-rate-assuring-protocol-with-privacy/

SamuelSchlesinger commented 1 week ago

@akakou if I understand Scrappy correctly, it allows one to rate limit to a single request within a given time window, which is significantly less flexible than allowing a rate limit per time window without any constraints on how fast that rate limit can be used. For instance, if I open 10 tabs within a minute while scrolling through a news feed, it would be a shame if you could only have 1 anonymous show.

akakou commented 1 week ago

@SamuelSchlesinger As you mentioned, Scrappy, as described in the paper, has certain inconveniences. However, this can be overcome with your idea, I think.

Scrappy currently uses a time window in the signing parameters(called basename) of DAA. By extending Scrappy with your idea, we can use a concatenation of a time window and a counter in the basename instead of just the time window.

Concretely...

Before Scrappy: $\sigma = DAA\_Sign(msg, T, sk)$

Extended Scrappy with your idea $\sigma = DAA\_Sign(msg, T||C, sk)$

This works similarly to your rate-limiting system because DAA functions as a PRF and zero-knowledge proof.

SamuelSchlesinger commented 1 week ago

This works similarly to your rate-limiting system because DAA functions as a PRF and zero-knowledge proof.

This makes good sense, its better than my worse idea of adding RATE_LIMIT basenames per service and selecting a random one -- do you know how to efficiently add a proof that C < RATE_LIMIT? I think this approach would likely be more performant than a SNARK based approach, so I am very open to it.

akakou commented 1 week ago

@SamuelSchlesinger

You mean how Scrappy proves $C < \mathtt{RATE\_LIMIT}$ under the specific epoch, right?

akakou commented 1 week ago

@SamuelSchlesinger Honestly, I do not fully understand zk-cred yet, and the logic may be close to your ideas, I think.

We can accomplish this by the deterministic computation in $DAA\_Sign()$ .

A part of a signature (called pseudonym) is computed from the basename (i.e., $C$ *1) deterministically. (In detail, the $\mathtt{pseudonym} = H(\mathtt{basename})^{\mathtt{sk}} = H(C)^{\mathtt{sk}}$ )

In this case, the verifier cannot generate a valid signature with $C \ge \mathtt{RATE\_LIMIT}$ for the following reasons:

  1. If the signer chooses $C \ge \mathtt{RATE\_LIMIT}$ , the verifier can notice this from the $C$ sent by the signer.
  2. If the signer reuses $C$ which has already been sent in the past, the verifier can recognize this by checking the $\mathtt{pseudonym}$ .
    • This is because the verifier received the same $\mathtt{pseudonym}$ during the first verification.
  3. If the signer lies about $C$ , the verifier's verification will fail.
    • This is due to the DAA signatures having the (zero-knowledge) proof of $SPK\{(\mathtt{sk}): \mathtt{pseudonym} = H(C)^{\mathtt{sk}}\}$.

*1 For ease of explanation, we omit epoch $T$ but the same logic applies even if $T$ is added to the basename.

akakou commented 2 days ago

@SamuelSchlesinger

Hi! Is it all clear for you? (If my explanation is not enough, I hope to supplement it.)

SamuelSchlesinger commented 2 days ago

The thing that I can't really live with is sending C to the verifier, as it creates an information leak. We need to prove that 0 <= C < RATE_LIMIT in zero knowledge, not just give C.

akakou commented 1 day ago

@SamuelSchlesinger I see your concern: it would cause significant privacy concerns.

How about randomly choosing $C$?* This mitigates the privacy concern.

*choosing $C$ under the condition ($C < RATE\_LIMIT$ and $C$ have not been used)