gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

Write attack for Uber differential privacy anonymization #29

Open yoid2000 opened 6 years ago

yoid2000 commented 6 years ago

We're going to use this to attack the Uber anonymization system. I'm not sure what queries that system allows, but @rbh-93 is working on it, so he can answer questions about that or give you access to an implementation.

In our attack, we want to make a query that has exactly one user in the answer with some reasonable probability. In the attack, we find out if that is the case or not. If it is the case, then we make a singling-out claim for that user. If not, then we don't make a claim.

The first step is to find sets of column values or value ranges that have a good chance of identifying a single user. If you know the number of distinct users associated with any given column value, and you know the number of users in the table, then prob_user1 = col_val_users1/total_users is the probability that any given user has that column value. Then you want to find cases where:

total_users * prob_user1 * prob_user2 * ... = 1 (roughly)

In other words, the expected number of users with column/value 1 and column/value 2 and ... is one.

You can learn the total users with:

select count(distinct uid)
from table

To learn these probabilities for any given column, you can query the raw database with this query:

select column, count(distinct uid)
from table
order by 2 desc
limit 200

Use the askExplore() call on the raw database (rawDb) to do these.

Once you have a set of columns and values where this is the case, you can make a query like this:

select count(distinct uid)
from table
where col1 = val1 and col2 = val2 and ...

For the Uber system, each time you repeat the query, you get a new noise value with mean zero. So if you take X answers and take the average, you'll get the true answer with some probability.

After X queries, we predict that the true answer is 1 if the averaged answer is between 0.5 and 1.5.

We repeat the above X times and make a guess. For this query, use the askAttack() call, so that the system records it as an attack query. Once you have a guess, use the askClaim() call to record the guess. You can see examples of how these are used for other attacks in code/attacks.

AnirbanGhosh1512 commented 6 years ago

Started Working on it.

AnirbanGhosh1512 commented 6 years ago

Hello Prof. Paul,

It takes much time for me to understand the exact requirements. Please tell me that whatever I understood is right or not.

  1. I need to use the rest API which is build by @rbh-93 to learn the probabilities.
  2. Once I get the set of columns, I can use askAttack() and askClaim() to predict the true answer from the attack script.

Regards, Anirban Ghosh

yoid2000 commented 5 years ago

We will incorporate Rohan's REST interface into gdaScore, so you won't use his interface directly. Rather, you'll use askExplore() to make the preliminary queries, askAttack() to make the attack queries (to establish an average value), and askClaim() to make a claim about your guessed answer.

Until we have incorporated Rohan's REST interface, you can test your code against rawDb. I'm out of town right now, but will be back on Friday if you want to chat about it.

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Friday I was in your office but there was nobody. Perhaps you were there I saw people doing some get together downstairs. I will be available on Monday for the chat.

Regards, Anirban Ghosh

On Wed, Nov 28, 2018 at 7:35 AM Paul Francis notifications@github.com wrote:

We will incorporate Rohan's REST interface into gdaScore, so you won't use his interface directly. Rather, you'll use askExplore() to make the preliminary queries, askAttack() to make the attack queries (to establish an average value), and askClaim() to make a claim about your guessed answer.

Until we have incorporated Rohan's REST interface, you can test your code against rawDb. I'm out of town right now, but will be back on Friday if you want to chat about it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-442336190, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_lFChoFNjVvYQcvIzToHnrnTonjks5uzi6WgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Indeed I was downstairs chatting. But you could have interrupted me ... it would have been fine.

Anyway, see you Monday.

PF

On Fri, Nov 30, 2018 at 3:21 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

Friday I was in your office but there was nobody. Perhaps you were there I saw people doing some get together downstairs. I will be available on Monday for the chat.

Regards, Anirban Ghosh

On Wed, Nov 28, 2018 at 7:35 AM Paul Francis notifications@github.com wrote:

We will incorporate Rohan's REST interface into gdaScore, so you won't use his interface directly. Rather, you'll use askExplore() to make the preliminary queries, askAttack() to make the attack queries (to establish an average value), and askClaim() to make a claim about your guessed answer.

Until we have incorporated Rohan's REST interface, you can test your code against rawDb. I'm out of town right now, but will be back on Friday if you want to chat about it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-442336190, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4_lFChoFNjVvYQcvIzToHnrnTonjks5uzi6WgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-443217422, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qUfRzOeEJWWIAy0Rw5cE6oeRJKqDks5u0T7FgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

@AnirbanGhosh1512

As a step in this attack, you make a query like

select count(distinct uid)
from table
where col1 = val1 and col2 = val2 and ...

I have written a class method called getPublicColValues() which is meant to return a set of column values that may reasonably be publicly know. You can read about this interface at https://gda-score.github.io/gdaScore.m.html

When you write the part that looks for appropriate values, please limit yourself to values discovered by getPublicColValues()

Let me know if you have questions

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Below .json is currently my configuration. { "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "attack.aircloak.com", "port": 8432, "dbname": "banking", "user": "george@gda-score.org", "password": "secret", "type": "aircloak" } }

First one localBankingRaw as a config string working fine for me but the second one cloakBankingAnon seems like consist unauthorized parameters to get access to the db. As I tried with the settings of my colleague Ali Reza, its working fine. Perhaps I need an access in attack.airclock.com.

Regards, Anirban

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Thanks, now It is working with my newly created login.

Regards, Anirban

yoid2000 commented 5 years ago

Hi Anirban,

You need to change the "user" and "password" to match that of the account I just gave you. And set "host" to demo.aircloak.com.

PF

On Tue, Dec 11, 2018 at 2:36 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

Below .json is currently my configuration. { "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "attack.aircloak.com", "port": 8432, "dbname": "banking", "user": "george@gda-score.org", "password": "secret", "type": "aircloak" } }

First one localBankingRaw as a config string working fine for me but the second one cloakBankingAnon seems like consist unauthorized parameters to get access to the db. As I tried with the settings of my colleague Ali Reza, its working fine. Perhaps I need an access in attack.airclock.com.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-446204354, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qfs3UuT57I2kC_qBzYFjlIpaySuxks5u37TOgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

@AnirbanGhosh1512

As a step in this attack, you make a query like

select count(distinct uid)
from table
where col1 = val1 and col2 = val2 and ...

I have written a class method called getPublicColValues() which is meant to return a set of column values that may reasonably be publicly know. You can read about this interface at https://gda-score.github.io/gdaScore.m.html

When you write the part that looks for appropriate values, please limit yourself to values discovered by getPublicColValues()

Let me know if you have questions

Hello Prof. Paul,

As per the stated issue, you asked me to use below: To learn these probabilities for any given column, you can query the raw database with this query:

select column, count(distinct uid) from table order by 2 desc limit 200 Use the askExplore() call on the raw database (rawDb) to do these.

as per my findings askExplore is nothing but a queue to hold queries. But getPublicColValues() already have the query written dynamically. Just I need to send column names using a loop. Then based on the result I can calculate the probabilities and generate attack query.

Am I right? Please let me know if I misunderstood.

Regards, Anirban Ghosh

yoid2000 commented 5 years ago

Yes, your understanding is correct. You can loop through the column names and learn a set of values

By the way, there is also a method in class gdaAttack() called getTableCharacteristics that returns various statistics about each of the columns, including the number of distinct UIDs, the number of distinct values, the average number of UIDs per value, and things like that. You can read more about it at:

https://gda-score.github.io/gdaScore.m.html#gdaScore.gdaAttack.getTableCharacteristics

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

The method getPublicColValues() rejected those values which are less than 100 as per the written code. So is it ok to use this method or Should I write something new to fetch all the records even if the value is less than 100.

Regards, Anirban

yoid2000 commented 5 years ago

Hi Anirban,

You should use getPublicColValues(), because as an attacker we are assuming that you know these (they are public knowledge), but I don't want to assume that you know all values.

PF

On Sat, Dec 15, 2018 at 8:19 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

The method getPublicColValues() rejected those values which are less than 100 as per the written code. So is it ok to use this method or Should I write something new to fetch all the records even if the value is less than 100.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-447591530, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qUMIMsOs3z1jWEc59__xVkotEX5Lks5u5UsdgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

If I have a frequency column as an example giving the output using this query: {select frequency, count(distinct account_id) from accounts group by frequency order by 2 desc limit 200} frequency count "POPLATEK MESICNE" "4167" "POPLATEK TYDNE" "240" "POPLATEK PO OBRATU" "93"

So for the next query as per the issue stated: {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}

would it be like this: {select count(distinct account_id) from accounts where frequency = 'POPLATEK MESICNE' and frequency = 'POPLATEK TYDNE' and frequency = 'POPLATEK PO OBRATU'}

Please reply about my understanding:

Regards, Anirban

yoid2000 commented 5 years ago

no, each condition in the query needs to be for a different column.

PF

On Tue, Dec 18, 2018 at 9:01 AM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

If I have a frequency column as an example giving the output using this query: {select frequency, count(distinct account_id) from accounts group by frequency order by 2 desc limit 200} frequency count "POPLATEK MESICNE" "4167" "POPLATEK TYDNE" "240" "POPLATEK PO OBRATU" "93"

So for the next query as per the issue stated: {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}

would it be like this: {select count(distinct account_id) from accounts where frequency = 'POPLATEK MESICNE' and frequency = 'POPLATEK TYDNE' and frequency = 'POPLATEK PO OBRATU'}

Please reply about my understanding:

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448293435, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qQ0-AMbYIzyUbuiMO8bOTHOov0C_ks5u6R9sgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

By calling routine getPublicColValues() gives me the below output:

{ 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)]}

Before writing the query {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}, I need some clarification which seems would be good by a chat in your office.

Can I stop by in your office in the next few days to clarify my understanding before I proceed?

Regards, Anirban Ghosh

yoid2000 commented 5 years ago

I wonder if there is a bug with getPublicColValues. It should be returning more than that. Can you meet me tomorrow afternoon?

PF

On Wed, Dec 19, 2018, 18:00 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul,

By calling routine getPublicColValues() gives me the below output:

{ 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)]}

Before writing the query {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}, I need some clarification which seems would be good by a chat in your office.

Can I stop by in your office in the next few days to clarify my understanding before I proceed?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448668788, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qf8ah5s8GHuLNduXdqmLMCSAdS33ks5u6nCWgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul, The actual output is below:

{ 'account_id': [], 'acct_date': [], 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'birth_number': [], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'client_id': [], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)], 'lastname': []}

I checked a condition if the returned value is [], then no need to consider. I am available after 3 pm tomorrow, So I can come to your office.

Regards, Anirban

yoid2000 commented 5 years ago

Ok see you then. In the meantime I'll look into what is wrong with that routine

PF

On Wed, Dec 19, 2018, 18:12 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul, The actual output is below:

{ 'account_id': [], 'acct_date': [], 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'birth_number': [], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'client_id': [], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)], 'lastname': []}

I checked a condition if the returned value is [], then no need to consider. I am available after 3 pm tomorrow, So I can come to your office.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448672936, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qVuPW0Q-4nQF8-By9oGjK87bQWptks5u6nN4gaJpZM4Yqg1B .

yoid2000 commented 5 years ago

I changed the parameters of getPublicColValues() so that it returns somewhat more. Please pull the latest code repo and try running your code again. I'll see you this afternoon.

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

I take the latest code-base. Still, I am getting the same output. I checked the gui of Git and it shows no recent changes in the gda-score script. I wonder that is it updated or I miss something.

Regards, Anirban

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

A gentle reminder.

Regards, Anirban

yoid2000 commented 5 years ago

My bad. I pushed the changes just now. Please pull and try again.

PF

On Thu, Dec 27, 2018 at 11:57 AM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

A gentle reminder.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-450129106, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qVD2EE5HAtJVoJCfBbIEq00t7t_uks5u9KeTgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Sorry for being a late response. I got new output after calling the routine getPublicColValues() in gdAScore script. Now my question is: Are the columns which have some values as an example, 'acct_district_id' always fixed when I call a routine, Will it be affected later on if any changes of the database? If I simplify it currently the columns which comes as an output are: 'acct_district_id', cli_district_id, disp_type, frequency, lastname.

Now if I write the logic to build this query select count(distinct uid) from table where col1 = val1 and col2 = val2 and ..., I need to use combinatorics for 5 columns, but in case if it is 6 in future then this script will not be considered as a dynamic script. It would be static and work only for those columns.

Please let me know if it is ok for you so that I can start writing the logic for building the query.

Regards, Anirban

yoid2000 commented 5 years ago

Hi Anirban,

Your code should be dynamic. The input should just be the table name. From that the code should dynamically learn the column names, then learn the public column values, then form the attack queries etc. Your code should be able to work with any of the db001 tables (all the banking tables, taxi, census, etc.) without requiring any changes.

PF

On Thu, Jan 3, 2019 at 6:10 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

Sorry for being a late response. I got new output after calling the routine getPublicColValues() in gdAScore script. Now my question is: Are the columns which have some values as an example, 'acct_district_id' always fixed when I call a routine, Will it be affected later on if any changes of the database? If I simplify it currently the columns which comes as an output are: 'acct_district_id', cli_district_id, disp_type, frequency, lastname.

Now if I write the logic to build this query select count(distinct uid) from table where col1 = val1 and col2 = val2 and ..., I need to use combinatorics for 5 columns, but in case if it is 6 in future then this script will not be considered as a dynamic script. It would be static and work only for those columns.

Please let me know if it is ok for you so that I can start writing the logic for building the query.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-451210416, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qTM7visl1_If50NKxctNh7uMw3Tjks5u_jl7gaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

yoid2000 commented 5 years ago

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 <notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh anirbanghosh1512@gmail.com wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis <notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh anirbanghosh1512@gmail.com wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 <notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis <notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B .

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')
# Note error in this sql
sql = str(f"""select count(distinct account_id)
         from {params['table']}""")
query['sql'] = sql
x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

#TODO: uid should be dynamically allocated
colnames = x.getColNames()
primaryKeyColumn = dict(uid=colnames[0])
# Note this sql query is generated dynamically
outputCol = getKeyColumn
outputComb = getCombinations
comLength = len(outputComb)
colLength = len(outputCol)
#  20 is acclaimed as a branch of queries
branch = 20
# Launch queries
query = dict(myTag='query1')
# Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']})
          from {params['table']}
          where """)

while comLength > 0:
    val = getCombinations[len(outputComb) - comLength]
    sql = raw_sql
    while colLength > 0:
        if isinstance(val[len(outputCol) - colLength], six.string_types):
            dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and '
        else:
            dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and '
        if colLength == 1:
            if isinstance(val[len(outputCol) - colLength], six.string_types):
                dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""")
            else:
                dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""")
        colLength = colLength - 1
        sql = sql + dynamic_add
    query['sql'] = sql
    # query = dict(db="raw", sql=sql)
    # make 20 clone of each queries, write now 20 is acclaimed as a branch of queries
    for q in range(branch):
        x.askAttack(query)
    colLength = len(outputCol)
    comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable values

getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean after that

returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn):
    reply = x.getAttack()
    if 'error' in reply:
        print(reply['error'])
    else:
        returnResults.append(reply['answer'][0][0])
    if reply['stillToCome'] == 0:
        break
average = Average(returnResults)
if 0.5 <= average <= 1.5:
    average = 1.0
if average == 1.0:
    claim = True
    colnames = x.getColNames()
    primaryKeyColumn = dict(uid=colnames[0])
    spec = {}
    spec = {'uid': primaryKeyColumn, 'known': []}  # known is optional, and always null here
    outputCol = getKeyColumn
    val = getCombinations[i]
    key = 'guess'
    spec.setdefault(key,[])
    for item in range(len(outputCol)):
        spec[key].append({'col': outputCol[item], 'val': val[item]})
    x.askClaim(spec, claim=claim, cache=doCache)
        #claim = True
        #while True:
            #replyClaim = x.getClaim()
            #if v: print("Claim Result:")
            #if v: pp.pprint(replyClaim)
            #if replyClaim['stillToCome'] == 0:
                #break
    print("\nTest all correct (multiple guessed column):")
    attackResult = x.getResults()
    sc = gdaScores(attackResult)
    score = sc.getScores()
    # pp.pprint(score['col']['frequency'])
    if v: pp.pprint(score)
    returnResults = []
else:
    claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

yoid2000 commented 5 years ago

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult)
score = sc.getScores()
if v: pp.pprint(score)
attack.cleanUp()
final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a branch of

queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable values

getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean after

that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 <notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a branch of

queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable values

getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean after

that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis <notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a branch

of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable

values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean after

that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id = None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.

Should I post it in to generate score?

Regards, Anirban

On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456493819

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a

branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable

values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean

after that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Hi Anirban,

I'm confused how you got to this query in the first place. I thought you were using the output of getPublicColValues() to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. But getPublicColValues() queries the raw database, not the cloak, so you should not be getting * values. Also you should be ignoring NULL values, but that is a different matter.

On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id = None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.

Should I post it in to generate score?

Regards, Anirban

On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis <notifications@github.com

wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456493819

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456743593

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a

branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable

values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean

after that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?

Regards, Anirban

On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm confused how you got to this query in the first place. I thought you were using the output of getPublicColValues() to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. But getPublicColValues() queries the raw database, not the cloak, so you should not be getting * values. Also you should be ignoring NULL values, but that is a different matter.

On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id

None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.

Should I post it in to generate score?

Regards, Anirban

On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com

wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456493819

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456743593

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458534064

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a

branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable

values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean

after that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke41uMvZppqhDcwHt2vTlhHm2qD4Ayks5vKZoggaJpZM4Yqg1B .

yoid2000 commented 5 years ago

But getPublicColValues is only supposed to be used with the raw database.

What are configuring as 'rawDb'?

PF

On Tue, Feb 5, 2019 at 4:04 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?

Regards, Anirban

On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm confused how you got to this query in the first place. I thought you were using the output of getPublicColValues() to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. But getPublicColValues() queries the raw database, not the cloak, so you should not be getting * values. Also you should be ignoring NULL values, but that is a different matter.

On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id

None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.

Should I post it in to generate score?

Regards, Anirban

On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com

wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456493819

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456743593

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458534064

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458584292

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a

branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically

accessable values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the

mean after that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar

queries for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460642873 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke41uMvZppqhDcwHt2vTlhHm2qD4Ayks5vKZoggaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460670350, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRMI1hEguugkVnHJoE5Uzl6RIaR1ks5vKZ1UgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

It is configured for raw database only. But your requirements was that: Actually, could you produce these .json outputs for me using both the cloak and the raw database as the anonymous data. for raw database it is done already, for cloak database what routine should I use instead of getPublicColValues?

Regards, Anirban

On Tue, Feb 5, 2019 at 4:11 PM Paul Francis notifications@github.com wrote:

But getPublicColValues is only supposed to be used with the raw database.

What are configuring as 'rawDb'?

PF

On Tue, Feb 5, 2019 at 4:04 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?

Regards, Anirban

On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm confused how you got to this query in the first place. I thought you were using the output of getPublicColValues() to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. But getPublicColValues() queries the raw database, not the cloak, so you should not be getting * values. Also you should be ignoring NULL values, but that is a different matter.

On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id

None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.

Should I post it in to generate score?

Regards, Anirban

On Tue, Feb 5, 2019 at 2:46 PM Paul Francis < notifications@github.com> wrote:

The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.

Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the gdaAttack() class).

On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:

Hello Prof. Paul,

For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com

wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com

wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456493819

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-456743593

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458534064

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458584292

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/gda-score/code/issues/29#issuecomment-458613750

,

or mute

the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed

as a branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically

accessable values getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the

mean after that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar

queries for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460642873 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460646055 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke41uMvZppqhDcwHt2vTlhHm2qD4Ayks5vKZoggaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460670350, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRMI1hEguugkVnHJoE5Uzl6RIaR1ks5vKZ1UgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460673261, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke40CbjYEy0P-brn0RKpe3EWRcVFpKks5vKZ8YgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.

Then, when you use getPublicColValues, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).

In other words, your attack queries will be the same no matter what system you are attacking.

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

It seems like easy change but I am little confused where to change. Can I stop by in your office tomorrow and clear the doubts?

Regards, Anirban

On Tue, Feb 5, 2019 at 5:01 PM Paul Francis notifications@github.com wrote:

When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.

Then, when you use getPublicColValues, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).

In other words, your attack queries will be the same no matter what system you are attacking.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460693819, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4xNdcTmMGciiQ1snVo36uENBgdMRks5vKarKgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Yes, I'll be in the office tomorrow afternoon. Talk to you then.

By the way, if you haven't read https://www.gda-score.org/what-is-a-gda-score/, please do so. It may help you understand what to do.

PF

On Wed, Feb 6, 2019 at 2:03 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul,

It seems like easy change but I am little confused where to change. Can I stop by in your office tomorrow and clear the doubts?

Regards, Anirban

On Tue, Feb 5, 2019 at 5:01 PM Paul Francis notifications@github.com wrote:

When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.

Then, when you use getPublicColValues, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).

In other words, your attack queries will be the same no matter what system you are attacking.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460693819, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4xNdcTmMGciiQ1snVo36uENBgdMRks5vKarKgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-461015397, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qYdZwm8WwKOX6_UK9ngMateD78Rxks5vKtKVgaJpZM4Yqg1B .

AnirbanGhosh1512 commented 5 years ago

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

Please see the attached .png files for the attack. And please check the http://gist.github.com/ for the resultant .json files. Please let me know if you have any changes required.

Regards, Anirban

On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:

Hi Anirban,

I'm interested in the final json output, which you can produce using finishGdaAttack() see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs using makeGraphs.py in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.

sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)

Thanks,

PF

On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 <notifications@github.com

wrote:

Hello Prof. Paul,

The Database configuration is below:

{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }

The generated output of the attack script is below and it is working with raw db:

"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"

I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.

Regards, Anirban Ghosh

On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:

Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.

PF

On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I have done the necessary changes. Should I push it into git?

Regards, Anirban

On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:

Hello Prof. Paul,

Thanks for the reply. I will update the change accordingly.

Regards, Anirban

On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com

wrote:

When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.

PF

On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:

Hello Prof. Paul,

I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.

Pardon me if my understanding is wrong. Waiting for your reply.

Regards, Anirban

On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com

wrote:

If the query results rounded average is 1, then you ask for a claim (claim=True). Otherwise you don't ask for a claim (claim=False).

A rounded average will be 1 if the average is between 0.5 and 1.5.

The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.

PF

On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com

wrote:

Hello Prof. Paul,

I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?

Regards, Anirban Ghosh

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread <

https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B

.

import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch

This script makes attack queries, and then requests the

resulting GDA score.

pp = pprint.PrettyPrinter(indent=4)

params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)

def getTotalUser(): """Returns the number of users of the table."""

Launch queries

query = dict(uid='account_id')

Note error in this sql

sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)

def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser

def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""

Launch queries

TODO: uid should be dynamically allocated

colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])

Note this sql query is generated dynamically

outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)

20 is acclaimed as a branch of queries

branch = 20

Launch queries

query = dict(myTag='query1')

Raw query

raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)

while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql

query = dict(db="raw", sql=sql)

make 20 clone of each queries, write now 20 is acclaimed as a branch of

queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1

def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict

getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)

getKeyColumn = [] getResult = [] values = []

def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn

def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult

def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t

return r

Get number of return column

getKeyColumn = getNumberofKeyColumn(queryParser)

Get total result

getResult = getResultForComb(getKeyColumn)

Use of recursion for combinatorics, with dynamically accessable values

getCombinations = getCombinatorics(getResult)

Create all possible queries.

makeNoiseQuery(getKeyColumn, getCombinations)

get Average of the query branch

def Average(lst): return sum(lst) / len(lst)

gather all the result of branch queries in a list, do the mean after

that returnResults = []

verbose = 0 v = verbose doCache = True

branchReturn = 20

check number of combinations

outputComb = len(getCombinations)

And gather up the answers:

for i in range(outputComb):

make 20 clone of each queries, get result of 20 similar queries

for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)

claim = True

while True:

replyClaim = x.getClaim()

if v: print("Claim Result:")

if v: pp.pprint(replyClaim)

if replyClaim['stillToCome'] == 0:

break

print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()

pp.pprint(score['col']['frequency'])

if v: pp.pprint(score) returnResults = [] else: claim = False

score = x.getResults()

pp.pprint(score)

x.cleanUp()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Did you forget to leave the attachment?

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul,

I did. Its in zip file called Graphs.zip.

Regards, Anirban

On Thu, Feb 7, 2019 at 4:28 PM Paul Francis notifications@github.com wrote:

Did you forget to leave the attachment?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-461469556, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke42MkzJhojmiDdZgRjtGCSCqS0seRks5vLEYigaJpZM4Yqg1B .

yoid2000 commented 5 years ago

Since in fact your emails are transmitted through github, it could be that the attachment was stripped. Please just send it to me directly.