gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

Add Uber db type to gdaScore() #33

Open yoid2000 opened 5 years ago

yoid2000 commented 5 years ago

Once the Uber server is working on db001, I'd like to add an interface to the Uber server to the class gdaAttack() which can be found in common/gdaScore.py.

Currently there are two interfaces, postgres and aircloak. I want to add a third, called uber_dp. This will result in changes deep within gdaAttack(), upon which pretty much everything runs, so we need to be very careful here and validate that all of the examples in common/examples and attacks/examples work after the changes.

Note that the type of interface is specified in the common/config/master.json config file, as "type". Here is where a database is configured as uber_dp.

When gdaAttack() is called, it is handed a dict containing various parameters. An example of the config file for these parameters is for instance attacks/dumbList_Infer.py.json. For uber_dp, we need to add two additional parameters, "budget" and "epsilon".

The tricky part will be establishing the connection itself and making the queries. This all happens in a method called _dbWorker(), which runs as a thread. _dbWorker() calls _processQuery(), which is the thing that makes the query and returns the answer.

_dbWorker() sets up its connection with this code:

        # Establish connection to database
        connStr = str(f"host={d['host']} port={d['port']} dbname={d['dbname']} user={d['user']} password={d['password']}")
        if self._vb: print(f"    {me}: Connect to DB with DSN '{connStr}'")
        conn = psycopg2.connect(connStr)
        cur = conn.cursor()

You'll need to add a ifelse here like if d['type'] == 'uber_dp': ..... else: and put your connection setup there. (Note that both aircloak and postgres use the same underlying interface, which is why there isn't an ifelse currently.)

Note also that there is an interface to a cache sqlite database, with handles connInsert, curInsert, connRead, curRead. These will continue to run as is, so the new interface doesn't affect that.

In _processQuery(), the following code executes the query:

            try:
                cur.execute(query['sql'])
            except psycopg2.Error as e:
                reply = dict(error=e.pgerror)
            else:
                ans = cur.fetchall()
                numCells = self._computeNumCells(ans)
                reply = dict(answer=ans,cells=numCells)

You will need to add an ifelse to do the uber_dp query instead. Note in particular that the cur.fetchall() call returns a data structure that is a list of lists like this:

[
[[a1],[b1],[c1],...],
[[a2],[b2],[c2],...],
....
[[aN],[bN],[cN],...]
]

where a, b, c, etc. are the columns returned by the query, and 1, 2, ..., N are the rows returned by the query. Your interface much replicate this structure. When the uber server returns an error, or returns an out-of-budget message, this will be encoded in the error type (i.e. reply = dict(error=e.pgerror)).

If this is done right, then all the code running above this should work unchanged.

Please regard this particular issue as a kind of master issue. You should make specific smaller issues that we can test one at a time as you go. Each of the smaller issues will have an associated push, where the example functions are all verified as running.

Note that there are a number of helper methods that currently work with both postgres and aircloak interfaces. These include getColNamesAndTypes() and getTableNames(). I don't expect these to work with uber_dp, so you don't need to worry about that.

As always, let me know if you have questions.

yoid2000 commented 5 years ago

@rbh-93 please start on this issue.

Be sure to create a new branch for this. Don't write to master.

rbh-93 commented 5 years ago

Hello, I have been understanding the workflow in the gdaScore class but you mentioned: Note that the type of interface is specified in the common/config/master.json config file, as "type". Here is where a database is configured as uber_dp.
Am I supposed to create a new "type" or should I use the existing "postgres" type?

yoid2000 commented 5 years ago

If you don't make it a new type, how would gdaScore know to query uber?

PF

On Tue, May 28, 2019 at 4:53 PM Rohan notifications@github.com wrote:

Hello, I have been understanding the workflow in the gdaScore class but you mentioned: Note that the type of interface is specified in the common/config/master.json config file, as "type". Here is where a database is configured as uber_dp. Am I supposed to create a new "type" or should I use the existing "postgres" type?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/33?email_source=notifications&email_token=AAQP5KIT7ENDVUBXM2JALKLPXVBM5A5CNFSM4GSVMULKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMMMOI#issuecomment-496551481, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQP5KKY5WKITX7XDDRO32TPXVBM5ANCNFSM4GSVMULA .

rbh-93 commented 5 years ago

Another question is that will the _dbWorker() send the parameters (query, epsilon, budget) to the Python simpleServer which will then write the query to a file and the UberTool will read from the file and send back the result? This sending of parameters to the simpleServer.py will be done in the following part of gdaScore:

# Establish connection to database
        connStr = str(f"host={d['host']} port={d['port']} dbname={d['dbname']} user={d['user']} password={d['password']}")
        if self._vb: print(f"    {me}: Connect to DB with DSN '{connStr}'")
        conn = psycopg2.connect(connStr)
        cur = conn.cursor()

Is that correct?

rbh-93 commented 5 years ago

RIght now the UberTool connects to the database like this: val con_str = "jdbc:postgresql://db001.gda-score.org:5432/" + dbName + "?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&user=<username>&password=<password>" So the UberToo is connecting to db001.gda-score.org:5432.

yoid2000 commented 5 years ago

yes, with the caveat that there'd be a third condition (i.e. if aircloak ... elif postgres ... elif uber ....)

PF

On Tue, May 28, 2019 at 5:07 PM Rohan notifications@github.com wrote:

Another question is that will the _dbWorker() send the parameters (query, epsilon, budget) to the Python simpleServer which will then write the query to a file and the UberTool will read from the file and send back the result? This sending of parameters to the simpleServer.py will be done in the following part of gdaScore:

Establish connection to database

    connStr = str(f"host={d['host']} port={d['port']} dbname={d['dbname']} user={d['user']} password={d['password']}")
    if self._vb: print(f"    {me}: Connect to DB with DSN '{connStr}'")
    conn = psycopg2.connect(connStr)
    cur = conn.cursor()

Is that correct?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/33?email_source=notifications&email_token=AAQP5KP6ZVRLN7G5M7GMNMLPXVDCLA5CNFSM4GSVMULKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMN5JQ#issuecomment-496557734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQP5KN2QUJYHPQ5EHD5VKTPXVDCLANCNFSM4GSVMULA .

yoid2000 commented 5 years ago

Why do we care how the uber tool connects to the database?

PF

On Tue, May 28, 2019 at 5:12 PM Rohan notifications@github.com wrote:

RIght now the UberTool connects to the database like this: val con_str = "jdbc:postgresql://db001.gda-score.org:5432/" + dbName + "?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&user=&password=" So it is connecting to the postgres db on db001.gda-score.org:5432.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/33?email_source=notifications&email_token=AAQP5KMJSGQZPVLUPXBXYN3PXVDULA5CNFSM4GSVMULKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMONKI#issuecomment-496559785, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQP5KPAR7LHHJQIFQANQDLPXVDULANCNFSM4GSVMULA .

fraboeni commented 4 years ago

Hi @yoid2000 , I am currently working on the uber_interface branch. (https://github.com/gda-score/code/tree/uber_interface). I pushed my current working state even though it is not working. I am currently facing two issues when trying to test and thereby make it work and could benefit from your input.

1) Could you communicate me the address of the server where the uber_dp is running? 2) Could you give me a hint on what I need to run in order to test my changes in the project? I cannot figure out how I would initialize a process in the code that would run the gdaAttack.

Thank you for your help.

yoid2000 commented 4 years ago

Could you communicate me the address of the server where the uber_dp is running?

In this directory:

https://github.com/gda-score/anonymization-mechanisms/tree/master/uber/examples

you can find a file config.py that contains the URL of the uber DP service. It is:

https://db001.gda-score.org/ubertool

yoid2000 commented 4 years ago

Could you give me a hint on what I need to run in order to test my changes in the project? I cannot figure out how I would initialize a process in the code that would run the gdaAttack.

This is unfortunately rather complex.

This file:

https://github.com/gda-score/code/blob/master/gdascore/global_config/master.json

is a kind of master configuration. It contains all of the services, databases, and anonymization types.

Up to now, all anonymization types could be reached through 2 services, postgres and aircloak (https://github.com/gda-score/code/blob/9f4b2d0b600f546baf62874ea75b20d4461977cd/gdascore/global_config/master.json#L2-L13)

You need to add a new service, which could be called uber_dp or something like that. The master config would be updated with the new service and in other places where we link the anonymization scheme with the service, etc. I could help you with that.

Then, when you want to run a test, you could do something like you find here:

https://github.com/gda-score/attacks/blob/master/examples/testSinglingOut.py

In that example, you can find a config structure that tells the code what to get from the master config to run the attack (which ultimately generates queries to the service). The config is here:

https://github.com/gda-score/attacks/blob/master/examples/testSinglingOut.py#L25-L33