Divide-By-0 / blog

A constantly updating Hugo blog for Aayush's thoughts.
https://blog.aayushg.com
1 stars 1 forks source link

Add joint ZK data guilds post with Lily #8

Open Divide-By-0 opened 1 year ago

Divide-By-0 commented 1 year ago

Draft for any signed in user: https://hackmd.io/CMnvMcmzR7CRXOxCLAi2LQ

An alternatives to traditional ad platforms.

Discussion:

FAQ (by me and lily):

clarify more how the alternative to platform attestations would work?

yeah so you could bootstrap off of either 1) the ads that were clicked in the past, in which the server sends u a signature and only pays u once u prove that ur vector has transitioned via a zk proof on chain 2) general proof carrying data of any kind, where pcds include any provable data such as those parsable by https://pcd.team/

  • is a given person limited to one group, or can they be in many?

either! if in many, when ads are served to them it would just randomly choose which group they'd want to prove membership in or something, maybe based on a probabilistic breakdown. you could also be in a "group" where everyone spends similar amounts to you so that you dont feel like you're carrying freeloaders. might make sense for it just to be whichever advertiser bids the most to serve ads to some group the user is in. but yeah the math seems to work out fine regardless!

  • is this more for existing platforms or as a model for a new platform? or agnostic?

seems like a thing i.e. bluesky or friend tech might be more into, but i doubt legacy platforms will adopt

  • did you have in mind a system where the platform pays out part of the ad money to the user, or is that an orthogonal concern to privacy

this is how we encourage users to use this system! you could bootstrap a network effect for a new platform in this way. (if so, I wonder why no one has successfully done that before, since the ZK part isn't necessary for the ad revenue sharing? maybe because they need users to get advertisers so it's a chicken and an egg?)

  • where does the ad vector come from? is it based on lookalike audiences?

it could be any vectors of features that is a n dimensional representation describing users, via facets that people care about! regular targeted ads either provide an explicit list of features they want, or just generate some high-dimensional vector similar to vectors of their existing customers.

if it's the latter, it looks like they would just cross-reference emails to see which of their customers are on the platform and then aggregate those users' vectors, and then target similar vectors. but if you were setting up a system like this with a new platform that hadn't previously done data collection/selling, how would you bootstrap the set of lookalike users? maybe just pay them for use of their email data, I guess?

the former would be easier, but then you'd either have to have the users explicitly provide demographic data (with more financial incentive to distort it) or infer it, in which case I guess you'd have to run the inference locally too and prove you did it honestly. I guess you could send a proof of aggregate data about ads you'd clicked in the past and then the advertiser looks for people who clicked on ads similar to their ad, so working off ad similarity rather than user similarity

how do you deal with bots?

Idea 1 which I like less is that you could combine it with proof of unique humanity. it's tough though because of the privacy nullifier tradeoff. Maybe proof of personhood i.e. zk proof of gitcoin passport score, can increase your dividends?

Idea 2 (the most promising) is that spending $ is an effective way (maybe the only simple way?) that bots can be effectively detected. If clickthru to purchase rate was the metric for payment ratio over vector similarity, then you can effectively make bot farms useless. Maybe people who are part of a guild where people actually clickthrough to purchase get an attestation that they did, and get a significantly larger chunk of the ad $ pool. A guild with 0% purchase rates (i.e. all bots) will get 0% of the revenue. so its advantageous to be part of a guild where folks are buying at a similar rate [and maybe you're even automatically made part of one] as you are, so simultaneously people can't freeload and bots are penalized?

New users can also be much cheaper cheap to serve, so then the market evens it out!

Idea 3 is that users want to see kore relevant ads actually as long as their data is theirs, and so they'll update honestly. Their zk proof is of the form, I know the preimage of the commitment to this id hash and vector commitment on chain. This ad vector was added or multiplied to it, and has a valid signature from adco x. Now this new vector commitment hashed with the same ID commitment is y, but on chain u only see the hash

how do you preserve user privacy while serving user-level relevant ads?

Neither the platform nor ad server knows the vector of the user. The advertiser can either get a single average aggregate vector, or a smoothed convex hull of the vectors, or mean and variance in each dimension, then based on that can give the most relevant set of ads or decline to. Those ads can be all the same or heterogeneous and distributed internally in the group to the most relevant party!

lilyjordan commented 1 year ago

Remaining questions:

1. Inputs and outputs to model In the following sequence: a. User keeps some info from platform server (logs, attestations, model output, etc) b. User contributes proof to recursive SNARK c. Guild submits completed snark to advertiser for payment d. Advertiser sends guild set of ads e. Guild tells platform which ads should go to which users

for each step, can you write out really specifically what type of data gets passed? especially the inputs and outputs to the model that the advertiser receives at step c?

particular points of confusion:

2. How to preserve privacy when computing the SNARK If you (a user) pass your vector into a recursive computation step, the rest of the guild (or at least the next user) can determine what your vector was, right? So how would we preserve anonymity here?

3. How the ad-serving step works If the group is homogeneous enough to all be worth targeting with the same ad, then that defeats the point of privacy. So presumably they're heterogeneous and get served different ads. But in that case, how does the guild distribute ads among users without compromising their privacy? Eg, if you bid on an ad relevant to you, you're kind of doxxing yourself as being close to the target audience for that ad. Or if the guild assigns ads to users, how does it do that without knowing individual users' vectors?

RiverRuby commented 1 year ago
  1. How to preserve privacy when computing the SNARK If you (a user) pass your vector into a recursive computation step, the rest of the guild (or at least the next user) can determine what your vector was, right? So how would we preserve anonymity here?

If everyone in the guild can't see every update, then intuitively I don't think this is true. Here's a potential scheme:

Initialize with some random noise, and then people add their data one by one, or the vector makes multiple random rounds of people where people add different subsets of their recent data. Then everyone will only see the aggregate vector so far, and won't know what it looked before or even who came before them in line?

RiverRuby commented 1 year ago

Moving to https://www.notion.so/provenant/ZK-Data-Guilds-a08791e7bace42bbbb052c2e091463fd going forward