Open yoid2000 opened 5 years ago
Started Working on it.
Hello Prof. Paul,
It takes much time for me to understand the exact requirements. Please tell me that whatever I understood is right or not.
Regards, Anirban Ghosh
We will incorporate Rohan's REST interface into gdaScore
, so you won't use his interface directly. Rather, you'll use askExplore()
to make the preliminary queries, askAttack()
to make the attack queries (to establish an average value), and askClaim()
to make a claim about your guessed answer.
Until we have incorporated Rohan's REST interface, you can test your code against rawDb
. I'm out of town right now, but will be back on Friday if you want to chat about it.
Hello Prof. Paul,
Friday I was in your office but there was nobody. Perhaps you were there I saw people doing some get together downstairs. I will be available on Monday for the chat.
Regards, Anirban Ghosh
On Wed, Nov 28, 2018 at 7:35 AM Paul Francis notifications@github.com wrote:
We will incorporate Rohan's REST interface into gdaScore, so you won't use his interface directly. Rather, you'll use askExplore() to make the preliminary queries, askAttack() to make the attack queries (to establish an average value), and askClaim() to make a claim about your guessed answer.
Until we have incorporated Rohan's REST interface, you can test your code against rawDb. I'm out of town right now, but will be back on Friday if you want to chat about it.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-442336190, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_lFChoFNjVvYQcvIzToHnrnTonjks5uzi6WgaJpZM4Yqg1B .
Indeed I was downstairs chatting. But you could have interrupted me ... it would have been fine.
Anyway, see you Monday.
PF
On Fri, Nov 30, 2018 at 3:21 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
Friday I was in your office but there was nobody. Perhaps you were there I saw people doing some get together downstairs. I will be available on Monday for the chat.
Regards, Anirban Ghosh
On Wed, Nov 28, 2018 at 7:35 AM Paul Francis notifications@github.com wrote:
We will incorporate Rohan's REST interface into gdaScore, so you won't use his interface directly. Rather, you'll use askExplore() to make the preliminary queries, askAttack() to make the attack queries (to establish an average value), and askClaim() to make a claim about your guessed answer.
Until we have incorporated Rohan's REST interface, you can test your code against rawDb. I'm out of town right now, but will be back on Friday if you want to chat about it.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-442336190, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4_lFChoFNjVvYQcvIzToHnrnTonjks5uzi6WgaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-443217422, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qUfRzOeEJWWIAy0Rw5cE6oeRJKqDks5u0T7FgaJpZM4Yqg1B .
@AnirbanGhosh1512
As a step in this attack, you make a query like
select count(distinct uid)
from table
where col1 = val1 and col2 = val2 and ...
I have written a class method called getPublicColValues()
which is meant to return a set of column values that may reasonably be publicly know. You can read about this interface at https://gda-score.github.io/gdaScore.m.html
When you write the part that looks for appropriate values, please limit yourself to values discovered by getPublicColValues()
Let me know if you have questions
Hello Prof. Paul,
Below .json is currently my configuration. { "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "attack.aircloak.com", "port": 8432, "dbname": "banking", "user": "george@gda-score.org", "password": "secret", "type": "aircloak" } }
First one localBankingRaw as a config string working fine for me but the second one cloakBankingAnon seems like consist unauthorized parameters to get access to the db. As I tried with the settings of my colleague Ali Reza, its working fine. Perhaps I need an access in attack.airclock.com.
Regards, Anirban
Hello Prof. Paul,
Thanks, now It is working with my newly created login.
Regards, Anirban
Hi Anirban,
You need to change the "user" and "password" to match that of the account I just gave you. And set "host" to demo.aircloak.com.
PF
On Tue, Dec 11, 2018 at 2:36 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
Below .json is currently my configuration. { "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "attack.aircloak.com", "port": 8432, "dbname": "banking", "user": "george@gda-score.org", "password": "secret", "type": "aircloak" } }
First one localBankingRaw as a config string working fine for me but the second one cloakBankingAnon seems like consist unauthorized parameters to get access to the db. As I tried with the settings of my colleague Ali Reza, its working fine. Perhaps I need an access in attack.airclock.com.
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-446204354, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qfs3UuT57I2kC_qBzYFjlIpaySuxks5u37TOgaJpZM4Yqg1B .
@AnirbanGhosh1512
As a step in this attack, you make a query like
select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...
I have written a class method called
getPublicColValues()
which is meant to return a set of column values that may reasonably be publicly know. You can read about this interface at https://gda-score.github.io/gdaScore.m.htmlWhen you write the part that looks for appropriate values, please limit yourself to values discovered by
getPublicColValues()
Let me know if you have questions
Hello Prof. Paul,
As per the stated issue, you asked me to use below: To learn these probabilities for any given column, you can query the raw database with this query:
select column, count(distinct uid) from table order by 2 desc limit 200 Use the askExplore() call on the raw database (rawDb) to do these.
as per my findings askExplore is nothing but a queue to hold queries. But getPublicColValues() already have the query written dynamically. Just I need to send column names using a loop. Then based on the result I can calculate the probabilities and generate attack query.
Am I right? Please let me know if I misunderstood.
Regards, Anirban Ghosh
Yes, your understanding is correct. You can loop through the column names and learn a set of values
By the way, there is also a method in class gdaAttack()
called getTableCharacteristics
that returns various statistics about each of the columns, including the number of distinct UIDs, the number of distinct values, the average number of UIDs per value, and things like that. You can read more about it at:
https://gda-score.github.io/gdaScore.m.html#gdaScore.gdaAttack.getTableCharacteristics
Hello Prof. Paul,
The method getPublicColValues() rejected those values which are less than 100 as per the written code. So is it ok to use this method or Should I write something new to fetch all the records even if the value is less than 100.
Regards, Anirban
Hi Anirban,
You should use getPublicColValues(), because as an attacker we are assuming that you know these (they are public knowledge), but I don't want to assume that you know all values.
PF
On Sat, Dec 15, 2018 at 8:19 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
The method getPublicColValues() rejected those values which are less than 100 as per the written code. So is it ok to use this method or Should I write something new to fetch all the records even if the value is less than 100.
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-447591530, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qUMIMsOs3z1jWEc59__xVkotEX5Lks5u5UsdgaJpZM4Yqg1B .
Hello Prof. Paul,
If I have a frequency column as an example giving the output using this query: {select frequency, count(distinct account_id) from accounts group by frequency order by 2 desc limit 200} frequency count "POPLATEK MESICNE" "4167" "POPLATEK TYDNE" "240" "POPLATEK PO OBRATU" "93"
So for the next query as per the issue stated: {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}
would it be like this: {select count(distinct account_id) from accounts where frequency = 'POPLATEK MESICNE' and frequency = 'POPLATEK TYDNE' and frequency = 'POPLATEK PO OBRATU'}
Please reply about my understanding:
Regards, Anirban
no, each condition in the query needs to be for a different column.
PF
On Tue, Dec 18, 2018 at 9:01 AM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
If I have a frequency column as an example giving the output using this query: {select frequency, count(distinct account_id) from accounts group by frequency order by 2 desc limit 200} frequency count "POPLATEK MESICNE" "4167" "POPLATEK TYDNE" "240" "POPLATEK PO OBRATU" "93"
So for the next query as per the issue stated: {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}
would it be like this: {select count(distinct account_id) from accounts where frequency = 'POPLATEK MESICNE' and frequency = 'POPLATEK TYDNE' and frequency = 'POPLATEK PO OBRATU'}
Please reply about my understanding:
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448293435, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qQ0-AMbYIzyUbuiMO8bOTHOov0C_ks5u6R9sgaJpZM4Yqg1B .
Hello Prof. Paul,
By calling routine getPublicColValues() gives me the below output:
{ 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)]}
Before writing the query {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}, I need some clarification which seems would be good by a chat in your office.
Can I stop by in your office in the next few days to clarify my understanding before I proceed?
Regards, Anirban Ghosh
I wonder if there is a bug with getPublicColValues. It should be returning more than that. Can you meet me tomorrow afternoon?
PF
On Wed, Dec 19, 2018, 18:00 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul,
By calling routine getPublicColValues() gives me the below output:
{ 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)]}
Before writing the query {select count(distinct uid) from table where col1 = val1 and col2 = val2 and ...}, I need some clarification which seems would be good by a chat in your office.
Can I stop by in your office in the next few days to clarify my understanding before I proceed?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448668788, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qf8ah5s8GHuLNduXdqmLMCSAdS33ks5u6nCWgaJpZM4Yqg1B .
Hello Prof. Paul, The actual output is below:
{ 'account_id': [], 'acct_date': [], 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'birth_number': [], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'client_id': [], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)], 'lastname': []}
I checked a condition if the returned value is [], then no need to consider. I am available after 3 pm tomorrow, So I can come to your office.
Regards, Anirban
Ok see you then. In the meantime I'll look into what is wrong with that routine
PF
On Wed, Dec 19, 2018, 18:12 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul, The actual output is below:
{ 'account_id': [], 'acct_date': [], 'acct_district_id': [(1, 554), (70, 152), (74, 135), (54, 128)], 'birth_number': [], 'cli_district_id': [(1, 547), (70, 146), (74, 144), (54, 133)], 'client_id': [], 'disp_type': [('OWNER', 4500), ('DISPONENT', 869)], 'frequency': [('POPLATEK MESICNE', 4167), ('POPLATEK TYDNE', 240)], 'lastname': []}
I checked a condition if the returned value is [], then no need to consider. I am available after 3 pm tomorrow, So I can come to your office.
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-448672936, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qVuPW0Q-4nQF8-By9oGjK87bQWptks5u6nN4gaJpZM4Yqg1B .
I changed the parameters of getPublicColValues()
so that it returns somewhat more. Please pull the latest code
repo and try running your code again. I'll see you this afternoon.
Hello Prof. Paul,
I take the latest code-base. Still, I am getting the same output. I checked the gui of Git and it shows no recent changes in the gda-score script. I wonder that is it updated or I miss something.
Regards, Anirban
Hello Prof. Paul,
A gentle reminder.
Regards, Anirban
My bad. I pushed the changes just now. Please pull and try again.
PF
On Thu, Dec 27, 2018 at 11:57 AM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
A gentle reminder.
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-450129106, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qVD2EE5HAtJVoJCfBbIEq00t7t_uks5u9KeTgaJpZM4Yqg1B .
Hello Prof. Paul,
Sorry for being a late response. I got new output after calling the routine getPublicColValues() in gdAScore script. Now my question is: Are the columns which have some values as an example, 'acct_district_id' always fixed when I call a routine, Will it be affected later on if any changes of the database? If I simplify it currently the columns which comes as an output are: 'acct_district_id', cli_district_id, disp_type, frequency, lastname.
Now if I write the logic to build this query select count(distinct uid) from table where col1 = val1 and col2 = val2 and ..., I need to use combinatorics for 5 columns, but in case if it is 6 in future then this script will not be considered as a dynamic script. It would be static and work only for those columns.
Please let me know if it is ok for you so that I can start writing the logic for building the query.
Regards, Anirban
Hi Anirban,
Your code should be dynamic. The input should just be the table name. From that the code should dynamically learn the column names, then learn the public column values, then form the attack queries etc. Your code should be able to work with any of the db001 tables (all the banking tables, taxi, census, etc.) without requiring any changes.
PF
On Thu, Jan 3, 2019 at 6:10 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
Sorry for being a late response. I got new output after calling the routine getPublicColValues() in gdAScore script. Now my question is: Are the columns which have some values as an example, 'acct_district_id' always fixed when I call a routine, Will it be affected later on if any changes of the database? If I simplify it currently the columns which comes as an output are: 'acct_district_id', cli_district_id, disp_type, frequency, lastname.
Now if I write the logic to build this query select count(distinct uid) from table where col1 = val1 and col2 = val2 and ..., I need to use combinatorics for 5 columns, but in case if it is 6 in future then this script will not be considered as a dynamic script. It would be static and work only for those columns.
Please let me know if it is ok for you so that I can start writing the logic for building the query.
Regards, Anirban
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-451210416, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qTM7visl1_If50NKxctNh7uMw3Tjks5u_jl7gaJpZM4Yqg1B .
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
If the query results rounded average is 1, then you ask for a claim
(claim=True
). Otherwise you don't ask for a claim (claim=False
).
A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B .
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 <notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRcyZTnUpH2ERpgkfVfIWtGqsj1Kks5vF04ogaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B .
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456493819, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4wDNAsvaJFAzLSc0ccxzLqmOd2Ubks5vGDSdgaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B .
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis notifications@github.com wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B .
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh anirbanghosh1512@gmail.com wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis <notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-456743593, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qc-cvyjKb02ZJY7J0wLIXWDtscmVks5vIEh8gaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B .
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh anirbanghosh1512@gmail.com wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis notifications@github.com wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 <notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458534064, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4-RFsQnLu0vGXU6dEU5dTtdjEKStks5vIGl3gaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B .
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 <notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis <notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458584292, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qQXjVIlGbRBrkG8Ank35ZJzmDsRiks5vIHpEgaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B .
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
query = dict(uid='account_id')
# Note error in this sql
sql = str(f"""select count(distinct account_id)
from {params['table']}""")
query['sql'] = sql
x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
#TODO: uid should be dynamically allocated
colnames = x.getColNames()
primaryKeyColumn = dict(uid=colnames[0])
# Note this sql query is generated dynamically
outputCol = getKeyColumn
outputComb = getCombinations
comLength = len(outputComb)
colLength = len(outputCol)
# 20 is acclaimed as a branch of queries
branch = 20
# Launch queries
query = dict(myTag='query1')
# Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']})
from {params['table']}
where """)
while comLength > 0:
val = getCombinations[len(outputComb) - comLength]
sql = raw_sql
while colLength > 0:
if isinstance(val[len(outputCol) - colLength], six.string_types):
dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and '
else:
dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and '
if colLength == 1:
if isinstance(val[len(outputCol) - colLength], six.string_types):
dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""")
else:
dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""")
colLength = colLength - 1
sql = sql + dynamic_add
query['sql'] = sql
# query = dict(db="raw", sql=sql)
# make 20 clone of each queries, write now 20 is acclaimed as a branch of queries
for q in range(branch):
x.askAttack(query)
colLength = len(outputCol)
comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
getKeyColumn = getNumberofKeyColumn(queryParser)
getResult = getResultForComb(getKeyColumn)
getCombinations = getCombinatorics(getResult)
makeNoiseQuery(getKeyColumn, getCombinations)
def Average(lst): return sum(lst) / len(lst)
returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
outputComb = len(getCombinations)
for i in range(outputComb):
for item in range(branchReturn):
reply = x.getAttack()
if 'error' in reply:
print(reply['error'])
else:
returnResults.append(reply['answer'][0][0])
if reply['stillToCome'] == 0:
break
average = Average(returnResults)
if 0.5 <= average <= 1.5:
average = 1.0
if average == 1.0:
claim = True
colnames = x.getColNames()
primaryKeyColumn = dict(uid=colnames[0])
spec = {}
spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here
outputCol = getKeyColumn
val = getCombinations[i]
key = 'guess'
spec.setdefault(key,[])
for item in range(len(outputCol)):
spec[key].append({'col': outputCol[item], 'val': val[item]})
x.askClaim(spec, claim=claim, cache=doCache)
#claim = True
#while True:
#replyClaim = x.getClaim()
#if v: print("Claim Result:")
#if v: pp.pprint(replyClaim)
#if replyClaim['stillToCome'] == 0:
#break
print("\nTest all correct (multiple guessed column):")
attackResult = x.getResults()
sc = gdaScores(attackResult)
score = sc.getScores()
# pp.pprint(score['col']['frequency'])
if v: pp.pprint(score)
returnResults = []
else:
claim = False
x.cleanUp()
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json
outputs for me using both the cloak and the raw database as the anonymous
data. Then produce the score diagrams from the json outputs using
makeGraphs.py
in code/graphs. Post the json files on gist.github.com, and
email me the score diagrams (.png files). If it isn't clear how to do this,
let me know so that I can update the readme files accordingly.
sc = gdaScores(attackResult)
score = sc.getScores()
if v: pp.pprint(score)
attack.cleanUp()
final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458613750, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4wn3Ky9yfntV3TvpoiVMVmwvR4Dpks5vIZfxgaJpZM4Yqg1B
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a branch of
queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable values
getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean after
that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B .
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 <notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a branch of
queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable values
getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean after
that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B .
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see
what your attack does and think about the best way to fix this (probably
better if it happens automatically in the gdaAttack()
class).
On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis <notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a branch
of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable
values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean after
that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B .
Hello Prof. Paul,
A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id = None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.
Should I post it in to generate score?
Regards, Anirban
On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the
gdaAttack()
class).On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456493819
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a
branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable
values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean
after that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qZpwWGFWZrY7ogZoNKOsYlqlOtuvks5vKYhNgaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B .
Hi Anirban,
I'm confused how you got to this query in the first place. I thought you
were using the output of getPublicColValues()
to then come up with
conditions that have a reasonable chance of matching exactly one user, and
then making an attack query from that. But getPublicColValues()
queries
the raw database, not the cloak, so you should not be getting *
values.
Also you should be ignoring NULL values, but that is a different matter.
On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id = None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.
Should I post it in to generate score?
Regards, Anirban
On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the
gdaAttack()
class).On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis <notifications@github.com
wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456493819
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456743593
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a
branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable
values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean
after that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460639201, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4w_njpQzlWz9cxGjTwSuTkvbWxK0ks5vKYs2gaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B .
Hello Prof. Paul,
You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?
Regards, Anirban
On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm confused how you got to this query in the first place. I thought you were using the output of
getPublicColValues()
to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. ButgetPublicColValues()
queries the raw database, not the cloak, so you should not be getting*
values. Also you should be ignoring NULL values, but that is a different matter.On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id
None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.
Should I post it in to generate score?
Regards, Anirban
On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the
gdaAttack()
class).On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com
wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456493819
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456743593
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458534064
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a
branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable
values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean
after that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460642873, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qXRXmmeHsudwDxZEV0LsuE_2nNyqks5vKY2EgaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke41uMvZppqhDcwHt2vTlhHm2qD4Ayks5vKZoggaJpZM4Yqg1B .
But getPublicColValues is only supposed to be used with the raw database.
What are configuring as 'rawDb'?
PF
On Tue, Feb 5, 2019 at 4:04 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?
Regards, Anirban
On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm confused how you got to this query in the first place. I thought you were using the output of
getPublicColValues()
to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. ButgetPublicColValues()
queries the raw database, not the cloak, so you should not be getting*
values. Also you should be ignoring NULL values, but that is a different matter.On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id
None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.
Should I post it in to generate score?
Regards, Anirban
On Tue, Feb 5, 2019 at 2:46 PM Paul Francis notifications@github.com wrote:
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the
gdaAttack()
class).On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com
wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456493819
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456743593
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458534064
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458584292
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a
branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically
accessable values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the
mean after that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar
queries for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460642873 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460646055, or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke41uMvZppqhDcwHt2vTlhHm2qD4Ayks5vKZoggaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460670350, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRMI1hEguugkVnHJoE5Uzl6RIaR1ks5vKZ1UgaJpZM4Yqg1B .
Hello Prof. Paul,
It is configured for raw database only. But your requirements was that: Actually, could you produce these .json outputs for me using both the cloak and the raw database as the anonymous data. for raw database it is done already, for cloak database what routine should I use instead of getPublicColValues?
Regards, Anirban
On Tue, Feb 5, 2019 at 4:11 PM Paul Francis notifications@github.com wrote:
But getPublicColValues is only supposed to be used with the raw database.
What are configuring as 'rawDb'?
PF
On Tue, Feb 5, 2019 at 4:04 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
You are right. getPublicColValues for the raw database is giving me proper output and also I used combinatorics and generate attack query and post it but if I use the same routine for clock database it sends me * and null values as the return. Do I need to use some another routine for clock database?
Regards, Anirban
On Tue, Feb 5, 2019 at 3:50 PM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm confused how you got to this query in the first place. I thought you were using the output of
getPublicColValues()
to then come up with conditions that have a reasonable chance of matching exactly one user, and then making an attack query from that. ButgetPublicColValues()
queries the raw database, not the cloak, so you should not be getting*
values. Also you should be ignoring NULL values, but that is a different matter.On Tue, Feb 5, 2019 at 2:56 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
A sample attack query calling the same routines for cloack database is like this: select count(distinct uid) from accounts where uid = None and account_id
None and acct_district_id = 1 and frequency = 'POPLATEK MESICNE' and acct_date = None and disp_type = 'OWNER' and birth_number = '' and cli_district_id = 1 and lastname = '' and firstname = '' and birthdate = None and gender = 'Male' and ssn = '' and email = '' and street = '' and zip = '*'.
Should I post it in to generate score?
Regards, Anirban
On Tue, Feb 5, 2019 at 2:46 PM Paul Francis < notifications@github.com> wrote:
The cloak returns '' when there are values that it has suppressed. In your attack, you should ignore '' values.
Have you posted your attack? Please do so if you could ... I want to see what your attack does and think about the best way to fix this (probably better if it happens automatically in the
gdaAttack()
class).On Tue, Feb 5, 2019 at 2:34 PM AnirbanGhosh1512 < notifications@github.com> wrote:
Hello Prof. Paul,
For your last requirements, I have produced .json and graphs for the raw database. But for clock, some columns consist the value even if the column type is date or integer. So after doing the combination, it comes out date= or acct_id =*. Will, it works for generating score because it definitely not works if I use the query in database editor. Please let me give some insight about this.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis < notifications@github.com
wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com , and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis < notifications@github.com
wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456493819
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-456743593
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458534064
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458584292
,
or mute
the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
https://github.com/gda-score/code/issues/29#issuecomment-458613750
,
or mute
the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458935621 , or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed
as a branch of queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically
accessable values getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the
mean after that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar
queries for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458989117 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-459230605 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460639201 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-460642873 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-460646055 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460665182, or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460670350, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRMI1hEguugkVnHJoE5Uzl6RIaR1ks5vKZ1UgaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460673261, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke40CbjYEy0P-brn0RKpe3EWRcVFpKks5vKZ8YgaJpZM4Yqg1B .
When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.
Then, when you use getPublicColValues
, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).
In other words, your attack queries will be the same no matter what system you are attacking.
Hello Prof. Paul,
It seems like easy change but I am little confused where to change. Can I stop by in your office tomorrow and clear the doubts?
Regards, Anirban
On Tue, Feb 5, 2019 at 5:01 PM Paul Francis notifications@github.com wrote:
When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.
Then, when you use getPublicColValues, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).
In other words, your attack queries will be the same no matter what system you are attacking.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460693819, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4xNdcTmMGciiQ1snVo36uENBgdMRks5vKarKgaJpZM4Yqg1B .
Yes, I'll be in the office tomorrow afternoon. Talk to you then.
By the way, if you haven't read https://www.gda-score.org/what-is-a-gda-score/, please do so. It may help you understand what to do.
PF
On Wed, Feb 6, 2019 at 2:03 PM AnirbanGhosh1512 notifications@github.com wrote:
Hello Prof. Paul,
It seems like easy change but I am little confused where to change. Can I stop by in your office tomorrow and clear the doubts?
Regards, Anirban
On Tue, Feb 5, 2019 at 5:01 PM Paul Francis notifications@github.com wrote:
When attacking the cloak, in your .json config file, you should set 'rawDb' to the raw database, and 'anonDb' to the cloak. In the configuration, 'rawDb' should always be set to the raw database, and 'anonDb' is set to whatever anonymization system you are attacking.
Then, when you use getPublicColValues, it will naturally query the raw database, and you will get the correct answers (in fact, you get exactly the same answer as before).
In other words, your attack queries will be the same no matter what system you are attacking.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-460693819, or mute the thread < https://github.com/notifications/unsubscribe-auth/Afke4xNdcTmMGciiQ1snVo36uENBgdMRks5vKarKgaJpZM4Yqg1B
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-461015397, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qYdZwm8WwKOX6_UK9ngMateD78Rxks5vKtKVgaJpZM4Yqg1B .
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
Hello Prof. Paul,
Please see the attached .png files for the attack. And please check the http://gist.github.com/ for the resultant .json files. Please let me know if you have any changes required.
Regards, Anirban
On Thu, Jan 31, 2019 at 7:23 AM Paul Francis notifications@github.com wrote:
Hi Anirban,
I'm interested in the final json output, which you can produce using
finishGdaAttack()
see below. Actually, could you produce these json outputs for me using both the cloak and the raw database as the anonymous data. Then produce the score diagrams from the json outputs usingmakeGraphs.py
in code/graphs. Post the json files on gist.github.com, and email me the score diagrams (.png files). If it isn't clear how to do this, let me know so that I can update the readme files accordingly.sc = gdaScores(attackResult) score = sc.getScores() if v: pp.pprint(score) attack.cleanUp() final = finishGdaAttack(params,score)
Thanks,
PF
On Wed, Jan 30, 2019 at 4:36 PM AnirbanGhosh1512 <notifications@github.com
wrote:
Hello Prof. Paul,
The Database configuration is below:
{ "localBankingRaw": { "host": "db001.gda-score.org", "port": 5432, "dbname": "banking", "user": "anirbanghosh1512@gmail.com", "password": "Aic0phuLoo0i", "type": "postgres" }, "cloakBankingAnon": { "host": "demo.aircloak.com", "port": 8432, "dbname": "gda_banking", "user": "anirbanghosh1512@gmail.com", "password": "anirban@123", "type": "aircloak" } }
The generated output of the attack script is below and it is working with raw db:
"Test all correct (multiple guessed column): susc 0, nextSusc 0.0, lastSusc 1e-06"
I have attached the current attack script I have written, Please have a look and let me know if further changes are needed.
Regards, Anirban Ghosh
On Wed, Jan 30, 2019 at 2:02 PM Paul Francis notifications@github.com wrote:
Before you push, can you show me the generated GDA Score for the case where you run the attack on Diffix? I want to see it working at least that much. Later when Uber is running we'll test it there.
PF
On Tue, Jan 29, 2019 at 5:44 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I have done the necessary changes. Should I push it into git?
Regards, Anirban
On Tue, Jan 29, 2019 at 4:33 PM Anirban Ghosh < anirbanghosh1512@gmail.com> wrote:
Hello Prof. Paul,
Thanks for the reply. I will update the change accordingly.
Regards, Anirban
On Tue, Jan 29, 2019 at 4:32 PM Paul Francis < notifications@github.com
wrote:
When you query against the Uber DP interface, you'll get back a different answer every time because the answers have zero- mean noise. By taking an average you can effectively reduce the noise and increase confidence.
PF
On Tue, Jan 29, 2019, 14:11 AnirbanGhosh1512 < notifications@github.com wrote:
Hello Prof. Paul,
I have been searching for you from last week in office but no luck. I just need one clarification, I thought I can stop by and ask but now time is flying, so I am asking in the issue tracker. The last email I got here is clearly mentioned the condition for the claim. Now currently let's say I have X query, and each query I am making a clone of n times and fire the same query. so the result, if I rounded of, would be n * result / n so it becomes the result value always. So why should I do this step? Instead, I can check the result value in between 0.5 to 1.5, and if it is yes then I can directly go for the claim.
Pardon me if my understanding is wrong. Waiting for your reply.
Regards, Anirban
On Wed, Jan 23, 2019 at 11:08 AM Paul Francis < notifications@github.com
wrote:
If the query results rounded average is 1, then you ask for a claim (
claim=True
). Otherwise you don't ask for a claim (claim=False
).A rounded average will be 1 if the average is between 0.5 and 1.5.
The point is, if the rounded average is 1, then you guess that there is exactly one user with the given attributes, and so you want to make a claim that you have singled out this user.
PF
On Tue, Jan 22, 2019 at 6:45 PM AnirbanGhosh1512 < notifications@github.com
wrote:
Hello Prof. Paul,
I need a little clarification for the last the discussion. If the query results average is greater than 1.0, then I can ask for a claim or whatever the mean value is I can go for a claim?
Regards, Anirban Ghosh
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456493819 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-456743593 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458534064 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/gda-score/code/issues/29#issuecomment-458584292 , or mute the thread <
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/gda-score/code/issues/29#issuecomment-458613750 , or mute the thread <
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458935621, or mute the thread <
.
import sys import pprint import six sys.path.append('../../common') from gdaScore import gdaAttack, gdaScores from myUtilities import checkMatch
This script makes attack queries, and then requests the
resulting GDA score.
pp = pprint.PrettyPrinter(indent=4)
params = dict(name='exampleAttack1', rawDb='localBankingRaw', anonDb='cloakBankingAnon', criteria='singlingOut', table='accounts', # change the table name to run individual table. flushCache=False, verbose=False) x = gdaAttack(params)
def getTotalUser(): """Returns the number of users of the table."""
Launch queries
query = dict(uid='account_id')
Note error in this sql
sql = str(f"""select count(distinct account_id) from {params['table']}""") query['sql'] = sql x.askAttack(query)
def getResultFromQuery(queryParser): """Returns the values of the table being used in the attack.""" colnames = x.getColNames() for i in colnames: values = x.getPublicColValues(i) if values != []: queryParser[i] = values return queryParser
def makeNoiseQuery(getKeycolumn, getCombinations): """Returns the noise of the table being used in the attack."""
Launch queries
TODO: uid should be dynamically allocated
colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0])
Note this sql query is generated dynamically
outputCol = getKeyColumn outputComb = getCombinations comLength = len(outputComb) colLength = len(outputCol)
20 is acclaimed as a branch of queries
branch = 20
Launch queries
query = dict(myTag='query1')
Raw query
raw_sql = str(f"""select count(distinct {primaryKeyColumn['uid']}) from {params['table']} where """)
while comLength > 0: val = getCombinations[len(outputComb) - comLength] sql = raw_sql while colLength > 0: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}' """) + ' and ' else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]} """) + ' and ' if colLength == 1: if isinstance(val[len(outputCol) - colLength], six.string_types): dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = '{val[len(outputCol) - colLength]}'""") else: dynamic_add = str(f"""{outputCol[len(outputCol) - colLength]} = {val[len(outputCol) - colLength]}""") colLength = colLength - 1 sql = sql + dynamic_add query['sql'] = sql
query = dict(db="raw", sql=sql)
make 20 clone of each queries, write now 20 is acclaimed as a branch of
queries for q in range(branch): x.askAttack(query) colLength = len(outputCol) comLength = comLength - 1
def getDiffrentColumnValues(col, values , queryParser): colvalDict = {} for key, value in queryParser.items(): if key == col: for allval in value: values.append(allval[0]) colvalDict = {col: values} values = [] return colvalDict
getTotalUser() result = x.getAttack() queryParser = {} getResultFromQuery(queryParser)
getKeyColumn = [] getResult = [] values = []
def getNumberofKeyColumn(queryParser): for key in queryParser: getKeyColumn.append(key) return getKeyColumn
def getResultForComb(getKeyColumn): for col in getKeyColumn: retDic = getDiffrentColumnValues(col, values, queryParser) getResult.append(retDic[col]) return getResult
def getCombinatorics(getResult): r = [[]] for x in getResult: t = [] for y in x: for i in r: t.append(i + [y]) r = t
return r
Get number of return column
getKeyColumn = getNumberofKeyColumn(queryParser)
Get total result
getResult = getResultForComb(getKeyColumn)
Use of recursion for combinatorics, with dynamically accessable values
getCombinations = getCombinatorics(getResult)
Create all possible queries.
makeNoiseQuery(getKeyColumn, getCombinations)
get Average of the query branch
def Average(lst): return sum(lst) / len(lst)
gather all the result of branch queries in a list, do the mean after
that returnResults = []
verbose = 0 v = verbose doCache = True
branchReturn = 20
check number of combinations
outputComb = len(getCombinations)
And gather up the answers:
for i in range(outputComb):
make 20 clone of each queries, get result of 20 similar queries
for item in range(branchReturn): reply = x.getAttack() if 'error' in reply: print(reply['error']) else: returnResults.append(reply['answer'][0][0]) if reply['stillToCome'] == 0: break average = Average(returnResults) if 0.5 <= average <= 1.5: average = 1.0 if average == 1.0: claim = True colnames = x.getColNames() primaryKeyColumn = dict(uid=colnames[0]) spec = {} spec = {'uid': primaryKeyColumn, 'known': []} # known is optional, and always null here outputCol = getKeyColumn val = getCombinations[i] key = 'guess' spec.setdefault(key,[]) for item in range(len(outputCol)): spec[key].append({'col': outputCol[item], 'val': val[item]}) x.askClaim(spec, claim=claim, cache=doCache)
claim = True
while True:
replyClaim = x.getClaim()
if v: print("Claim Result:")
if v: pp.pprint(replyClaim)
if replyClaim['stillToCome'] == 0:
break
print("\nTest all correct (multiple guessed column):") attackResult = x.getResults() sc = gdaScores(attackResult) score = sc.getScores()
pp.pprint(score['col']['frequency'])
if v: pp.pprint(score) returnResults = [] else: claim = False
score = x.getResults()
pp.pprint(score)
x.cleanUp()
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-458989117, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACD-qRzfrDWYWPcgFWJI0zfW1gcyo0iBks5vIbvugaJpZM4Yqg1B
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-459230605, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke4_Mu4C8sXXzQBZWE5VEvr4VRk8RGks5vIovZgaJpZM4Yqg1B .
Did you forget to leave the attachment?
Hello Prof. Paul,
I did. Its in zip file called Graphs.zip.
Regards, Anirban
On Thu, Feb 7, 2019 at 4:28 PM Paul Francis notifications@github.com wrote:
Did you forget to leave the attachment?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/29#issuecomment-461469556, or mute the thread https://github.com/notifications/unsubscribe-auth/Afke42MkzJhojmiDdZgRjtGCSCqS0seRks5vLEYigaJpZM4Yqg1B .
Since in fact your emails are transmitted through github, it could be that the attachment was stripped. Please just send it to me directly.
We're going to use this to attack the Uber anonymization system. I'm not sure what queries that system allows, but @rbh-93 is working on it, so he can answer questions about that or give you access to an implementation.
In our attack, we want to make a query that has exactly one user in the answer with some reasonable probability. In the attack, we find out if that is the case or not. If it is the case, then we make a singling-out claim for that user. If not, then we don't make a claim.
The first step is to find sets of column values or value ranges that have a good chance of identifying a single user. If you know the number of distinct users associated with any given column value, and you know the number of users in the table, then
prob_user1 = col_val_users1/total_users
is the probability that any given user has that column value. Then you want to find cases where:total_users * prob_user1 * prob_user2 * ... = 1
(roughly)In other words, the expected number of users with column/value 1 and column/value 2 and ... is one.
You can learn the total users with:
To learn these probabilities for any given column, you can query the raw database with this query:
Use the
askExplore()
call on the raw database (rawDb
) to do these.Once you have a set of columns and values where this is the case, you can make a query like this:
For the Uber system, each time you repeat the query, you get a new noise value with mean zero. So if you take X answers and take the average, you'll get the true answer with some probability.
After X queries, we predict that the true answer is 1 if the averaged answer is between 0.5 and 1.5.
We repeat the above X times and make a guess. For this query, use the
askAttack()
call, so that the system records it as an attack query. Once you have a guess, use theaskClaim()
call to record the guess. You can see examples of how these are used for other attacks incode/attacks
.