Open bemasc opened 1 year ago
I think this has been discussed already, but I can't remember where. In any case, there are a couple of questions that come to mind:
I think this has been discussed already, but I can't remember where. In any case, there are a couple of questions that come to mind:
- How does the proxy actually batch and release reports? If it batches by looking at report tags, then we've just moved the dictionary attack problem from the aggregation server to the proxy. If it batches by looking at encrypted blobs (which would be the case with unreliable OHTTP or whatever), then we're compromising the ability to verify STAR reports in a streaming fashion, making STAR closer to batch-based protocols such as Poplar.
I mean the latter: each OPRF epoch is one big batch.
- The fundamental assumption about OHTTP is that the gateway (the aggregation server here) cannot observe the relay's ingress link, so I'm struggling to see why this assumption would not carry over to its use for STAR.
The OHTTP threat model is much weaker than, say, the DAP non-collusion assumption. This issue is about strengthening STAR against certain attackers.
Another variation on this problem is related to aggregators who also have timing information about events that cause clients to generate reports.
- Since the dictionary attack only requires one query per target, it's not clear to me that batching helps improve detection, but maybe I'm misunderstanding?
Currently, the attacker can attempt to run a dictionary attack against a specific report. The attacker can stop their attack once it succeeds. (This attack is especially relevant if the dictionary attacker can also overcome OHTTP to link the report to a specific user.)
If the reports are batched by epoch, the attacker can't target individual reports, and doesn't know whether the attack worked at all until the end of the epoch. Attackers might compensate by trying more values, but this is more suspicious to the Randomness Server and more likely to be detected.
The OHTTP threat model is much weaker than, say, the DAP non-collusion assumption. This issue is about strengthening STAR against certain attackers.
I don't agree with this. Both OHTTP and DAP require non-collusion for privacy.
Another variation on this problem is related to aggregators who also have timing information about events that cause clients to generate reports.
That's true without batching, but I'm not seeing why this timing information is helpful. Can you elaborate?
Currently, the attacker can attempt to run a dictionary attack against a specific report. The attacker can stop their attack once it succeeds. (This attack is especially relevant if the dictionary attacker can also overcome OHTTP to link the report to a specific user.)
If the reports are batched by epoch, the attacker can't target individual reports, and doesn't know whether the attack worked at all until the end of the epoch. Attackers might compensate by trying more values, but this is more suspicious to the Randomness Server and more likely to be detected.
I'm really not following this. The way a dictionary attack works is as follows:
Whether or not the attacker does (2) as reports or streamed in or at the end of an epoch, say, doesn't really seem like a useful distinction. Can you say why you think this distinction matters?
The OHTTP threat model is much weaker than, say, the DAP non-collusion assumption. This issue is about strengthening STAR against certain attackers.
I don't agree with this. Both OHTTP and DAP require non-collusion for privacy.
OHTTP additionally requires that the target/gateway can't learn the network traffic near the relay. Otherwise it can likely deanonymize requests by timing correlation.
("Unreliable OHTTP" would enable a mitigation of this problem.)
Another variation on this problem is related to aggregators who also have timing information about events that cause clients to generate reports.
That's true without batching, but I'm not seeing why this timing information is helpful. Can you elaborate?
If the aggregator knows when client A sends its reports (e.g. it can trigger the client to send a report), then it can tell which reports are (likely) from client A by timing. This is the same problem as the OHTTP timing attack, but using some application-level event instead of a network tap near the OHTTP ingress.
I'm really not following this. The way a dictionary attack works is as follows:
- Attacker queries randomness server using some input and computes the corresponding tag.
- Attacker checks tag against reports it sees.
Whether or not the attacker does (2) as reports or streamed in or at the end of an epoch, say, doesn't really seem like a useful distinction. Can you say why you think this distinction matters?
This is basically an argument about attack economics related to risk of discovery.
Consider the following attack:
In a batch model, this dictionary attack requires N attempts every time to guarantee success. The increased number of attempts makes the attack easier to detect.
Thanks for clarifying. Taking a step back, I think it might be helpful if we write down the threat model you have in mind here, as well as what you view the ideal security properties to be with respect to that threat model. Is that something you could take a stab at?
In short, I am saying that STAR currently does not achieve its privacy goals in the presence of a malicious aggregator who is a Dolev-Yao attacker, or even a Pervasive Passive Attacker. I believe it could tolerate this (i.e. adding these attacks to the threat model) with the change I described.
EDIT: I'm not saying this change is necessarily worthwhile: it increases the cost of operating the proxy, adds more bespoke components, and delays access to the output. I'm just saying it should be considered.
In short, I am saying that STAR currently does not achieve its privacy goals in the presence of a malicious aggregator who is a Dolev-Yao attacker
This is making a lot of assumptions about traffic analysis and the like, right?
or even a Pervasive Passive Attacker.
What is a pervasive passive attacker?
In short, I am saying that STAR currently does not achieve its privacy goals in the presence of a malicious aggregator who is a Dolev-Yao attacker
This is making a lot of assumptions about traffic analysis and the like, right?
Not really. A Dolev-Yao attacker (active network attacker) has enormous flexibility to do things like drop or delay non-target traffic to ensure that the target's traffic is easy to spot by timing behavior.
or even a Pervasive Passive Attacker.
What is a pervasive passive attacker?
I mean what RFC 7258 calls "Pervasive Monitoring": an attacker who can observe (but not modify) network traffic at many locations.
Not really. A Dolev-Yao attacker (active network attacker) has enormous flexibility to do things like drop or delay non-target traffic to ensure that the target's traffic is easy to spot by timing behavior.
I see your point now. As an example, you're saying that because the DY attacker can modify with timing, then it can let the aggregation server link encrypted reports to distinct clients and, when decryption succeeds, the aggregation server learns which client sent which value. Thanks for clarifying 👍 I would agree with that assessment.
Reusing OHTTP for STAR is convenient, but it creates two unnecessary weaknesses:
If the encrypted reports for each epoch were batched (and shuffled) at the proxy, then network traffic analysis would not reveal anything, and dictionary attacks would have to proceed without any access to the reports, presumably making them less efficient and therefore easier to detect.