gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

Implement Difference attack #34

Open yoid2000 opened 5 years ago

yoid2000 commented 5 years ago

For this issue, implement the difference attack described in section 5.2.2 of the Extended Diffix paper (https://aircloak.com/wp-content/uploads/Complete-Diffix.pdf). The criteria is singling out. You can see an example of a singling out attack at https://github.com/gda-score/code/blob/master/attacks/dumbList_SingOut.py.

This attack has two parts.

  1. The attacker must find a user that can be isolated.
  2. The attacker must make the set of queries that isolates the user.

To isolate a user, the attacker must find two queries where the counts of distinct users differs by exactly 1. The easiest way to do that is with not equals condition (<>). What we do is find a column that is likely to have many user-unique values. This could be any column where the number of distinct users is say 50% or more than the number of distinct values. The uid column is always such a column, and so is lastname.

You can use the function getTableCharacteristics() to determine which columns apply.

https://gda-score.github.io/gdaScore.m.html#gdaScore.gdaAttack.getTableCharacteristics

After selecting one such column, do an askKnowledge() query to get the contents of that column. Then select values in the column that for which there is only one user. Call these col_iso and val_iso.

For every other column (other than col_iso), we make two queries using ask_attack(). One query looks like this:

select col_other, count(distinct uid)
from table
where col_iso <> val_iso
group by 1

And the other query like this:

select col_other, count(distinct uid)
from table
group by 1

We are looking for the col_other value where the user (the victim) is not in the first query but is in the second. We'll assume that the value where the difference between the second count and the first count is largest will be that value.

Then we make a claim using ask_claim() based on this value.

For each col_other make 20 attack pairs (i.e. use 20 different val_iso values) and 20 corresponding claims.

yoid2000 commented 5 years ago

I've written a short article explaining how to write an attack. It is here: https://www.gda-score.org/quick-guide-to-writing-attacks/

resha1417 commented 5 years ago

Hello sir,

I got some results during attack, I want to make sure that for those results what i am thinking is correct or not. When i am attacking on ssn, It is giving me results like this: For the 1st Query : SELECT ssn, count(DISTINCT uid) FROM accounts GROUP BY ssn Result: [['', 5369]] For the 2nd Query:SELECT ssn, count(DISTINCT uid) FROM accounts WHERE uid<>2848 GROUP BY ssn Result: [['', 5364]]

As we discussed,where no of pairs are same for query 1 and query 2,for that column i will claim. here i am getting 1 pair for both query, but as it is fully anonymized ,i can not come to know for which ssn uid=2848 belong to. because results giving me the maximum difference =5, but instead of snn ,here is *. so should i have to claim these kind of columns (fully anonymized) also.

Regards, Resha

yoid2000 commented 5 years ago

Hi Resha,

You cannot make a claim on an answer that has '*'.

PF

On Tue, Apr 23, 2019 at 10:27 AM resha1417 notifications@github.com wrote:

Hello sir,

I got some results during attack, I want to make sure that for those results what i am thinking is correct or not. When i am attacking on ssn, It is giving me results like this: For the 1st Query : SELECT ssn, count(DISTINCT uid) FROM accounts GROUP BY ssn Result: [['

', 5369]] For the 2nd Query:SELECT ssn, count(DISTINCT uid) FROM accounts WHERE uid<>2848 GROUP BY ssn Result: [['', 5364]]

As we discussed,where no of pairs are same for query 1 and query 2,for that column i will claim. here i am getting 1 pair for both query, but as it is fully anonymized ,i can not come to know for which ssn uid=2848 belong to. because results giving me the maximum difference =5, but instead of snn ,here is *. so should i have to claim these kind of columns (fully anonymized) also.

Regards, Resha

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/34#issuecomment-485696199, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQP5KJPX6Q4NJK2N25NKETPR3B7JANCNFSM4GSY5LBQ .

resha1417 commented 5 years ago

Ok. Thank you very much

On Tue, Apr 23, 2019, 10:48 AM Paul Francis notifications@github.com wrote:

Hi Resha,

You cannot make a claim on an answer that has '*'.

PF

On Tue, Apr 23, 2019 at 10:27 AM resha1417 notifications@github.com wrote:

Hello sir,

I got some results during attack, I want to make sure that for those results what i am thinking is correct or not. When i am attacking on ssn, It is giving me results like this: For the 1st Query : SELECT ssn, count(DISTINCT uid) FROM accounts GROUP BY ssn Result: [['

', 5369]] For the 2nd Query:SELECT ssn, count(DISTINCT uid) FROM accounts WHERE uid<>2848 GROUP BY ssn Result: [['', 5364]]

As we discussed,where no of pairs are same for query 1 and query 2,for that column i will claim. here i am getting 1 pair for both query, but as it is fully anonymized ,i can not come to know for which ssn uid=2848 belong to. because results giving me the maximum difference =5, but instead of snn ,here is *. so should i have to claim these kind of columns (fully anonymized) also.

Regards, Resha

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/34#issuecomment-485696199, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAQP5KJPX6Q4NJK2N25NKETPR3B7JANCNFSM4GSY5LBQ

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/34#issuecomment-485710158, or mute the thread https://github.com/notifications/unsubscribe-auth/AKNADE46SWGVSSLHFON2II3PR3ENLANCNFSM4GSY5LBQ .