Open thegreatfatzby opened 7 months ago
- TEEs to talk to each other, still within limits.
- All TEEs (including buy/sell services) to be backed by the same KV data storage as the KV TEE.
- Ideally, very limited calls could be made out of the TEEs as well.
I understand (3) has privacy risks, so I'll ignore that here since I think (1) and (2) are more clear wins.
I've had server composition support in mind for KV for a while which I hinted at privacysandbox/protected-auction-key-value-service/issues/10.
But even for (1) there's the privacy risk of traffic analysis. The KV server sharding functionality is a more confined case of the vision here and preventing traffic analysis there is still a pain. So we have not invested more in server composition.
My hope was to see how the sharding usage pattern pans out in real world and see if that gives us more data points on supporting even more advanced topology. But if you think this is a promising direction for real uses, maybe we can prioritize researching it sooner.
Hey @peiwenhu I personally see it as an incredibly useful direction, but of course I'd really love to hear from folks like @rdgordon-index @jonasz @fhoering @lbdvt @davideanastasia and others.
That said, to elaborate on my thinking, let's say we could snap our fingers and say that ad techs could focus on two things:
And that we'd rely on the BA/ASAPI framework to coordinate the inputs into bidding and auction functions, enforce output privacy, and if it makes sense input privacy as well. Then I really think we'd be in a different world then today when we have to completely change topology, domain models, logic, cost structures, etc. I think we'd still have a lot of problems to solve (debugging in enclaves, performance overhead), but we'd be solving them within the kinds of architectural and modeling constraints we've spent man-centuries developing expertise in.
/cc @yarongmu-google
Tentatively we plan to look into this in H2 this year. We expect there will be constraints for privacy reasons and we can only focus on the lower hanging fruits for now. For example, there could be a fixed size requirement on TEE-TEE request/response, there could be a fixed # of endpoints each TEE server knows and that cannot be changed at runtime, and requests could need to be sent to all said endpoints etc.. So it would still require some careful design from the server operator side, albeit closer to the more classic architectural model.
Overview
Thinking towards the genuinely private future, both on device and BA, I've been pondering how we're constraining architecture and domain models, which ultimately will constrain operations, cost structures, and utility. One of the constraints I think we could loosen without sacrificing privacy would be how data can a) flow into different TEEs and b) travel once it hits the Trusted Execution Environment.
Right now, if you tilt your head and squint real good, the KV Server almost looks like a real server that you can deploy code and data to: you can install WASM'ized C/C++/etc code, you can have data synced in batch or incrementally, you can shard data and do some load balancing...debugging is still a problem, but that's kind of a separate issue.
The "almost" is because the inability to make network calls prevents typical service composition where you'd have different bits of logic owned by different teams, that can interact with each other, scale independently, have separate SLAs and QoS guarantees, and different cost structures. Additionally, currently not all TEEs are backed by the KV server and it's data storage.
So what I'd like to propose is that we allow:
I understand (3) has privacy risks, so I'll ignore that here since I think (1) and (2) are more clear wins.
Example
To illustrate via example, I'd love to be able to propose the following dream world to our DSP folks, where we try to make "private replicas" of existing systems and topologies.
Specifics
Inter TEE Communication
I'd propose the following:
All TEEs can be KVs
Currently we've made it so the Buyer Logic will get all it's real time data from the KV server but isn't co-located with data...I'm not sure I see a privacy reason for this if the server is trusted...so just allow the BFE/SFE/etc to be backed by a KV.
Conclusion
Allowing this would get us much closer to an ad tech being able to replicate their current operations, team-system ownership structures, logic, cost structure, etc, in the private environment, rather than having to re-implement both logic and topology, and merge different teams systems and operations, which will inevitably lead to issues.