gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

Add budget to Uber tool #23

Closed yoid2000 closed 4 years ago

yoid2000 commented 5 years ago

The Uber tool was written without any budget, so we need to add one. Please implement the following:

  1. Client can start a "session" with the server. At start of session, client can supply a budget, which is a real number greater than zero. Client can also supply a default epsilon. This epsilon is used for each query.
  2. For each session, server keeps track of the used_budget. The used_budget is initialized to zero at the beginning of the session. Before each query, the server adds epsilon to used_budget. If the resulting used_budget exceeds the session budget, then the query is rejected.
rbh-93 commented 5 years ago

Should I incorporate these changes in the same code I pushed or should I create a separate folder for these changes? Also, should I pass the variables (budget, epsilon) as JSON from Client to Server or do you want me to do it via sockets? Do you have any preference?

yoid2000 commented 5 years ago

push this as an update to the existing code (not a new folder)

please pass the budget and epsilon in the json. Everything can be passed in the json.

On Sun, Nov 11, 2018 at 6:13 PM Rohan notifications@github.com wrote:

Should I incorporate these changes in the same code I pushed or should I create a separate folder for these changes? Also, should I pass the variables (budget, epsilon) as JSON from Client to Server or do you want me to do it via sockets? Do you have any preference?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-437687314, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qVKmU6t5TQYyPagtxWKvrstEMSKXks5uuFrGgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Okay. I have pushed the previous code with correct data flow (queryResult from server to client). I will start working on the privacy budget.

rbh-93 commented 5 years ago

Hi, I have implemented the budget and epsilon part. This is currently serving a single client. Do you want me to include the case where multiple clients may access the system at the same time? For example, If a client sends a request to the server with budget=2 and epsilon=1 he can run queries maximum 2 times. Now, do you want to consider a scenario that another client sends a request to the server at the same time with a separate budget and epsilon value?

yoid2000 commented 5 years ago

I'd like to maintain budget per "session". This means that each query is associated with a session, and the server maintains a per-session budget. When the budget expires, the no more queries for that session will be allowed. Are you doing it this way now?

rbh-93 commented 5 years ago

Right now, I have set the budget as a global variable. If you want a scenario where multiple different clients are accessing the server concurrently, the code can be modified to accommodate for that as well.

rbh-93 commented 5 years ago

I will add a bit to the code I have written so that it functions on a per session basis.

rbh-93 commented 5 years ago

I have pushed the code which handles queries on a per session basis. Please take a look and let me know if everything is okay.

yoid2000 commented 5 years ago

please move the code so that it sits in the anon-methods directory

please change the README.md to indicate that this is derived from a fork of the uber code.

In the meantime, I'll have a look

yoid2000 commented 5 years ago

I'm not sure what to do. Can you update the README.md with basic info on how to run etc.?

rbh-93 commented 5 years ago

I am writing the README.md but I just remembered, the schema hasn't been generated yet. I am waiting on the changes of the tool you had mentioned for schema generation. You can run it locally on your machine but that will require you to have a database on your local machine and put in the schema manually. If you are free I can go tomorrow and run it on my machine and show it to you.

yoid2000 commented 5 years ago

why is this closed?

rbh-93 commented 5 years ago

Sorry, I closed the issue by mistake. I have pushed the related code to git. I will update the README accordingly and start making the schemas.

yoid2000 commented 5 years ago

I don't find sid in src/main/scala/examples/simpleClient.py

rbh-93 commented 5 years ago

Sorry for the late reply. 'sid' is in the JSON request in simpleClient.py. For first request from a particular client send 'sid'= ' ' in the JSON payload. For subsequent requests put the Session ID returned by simpleServer.py in the JSON.

yoid2000 commented 5 years ago

Yes I know that... It is in the readme. Do I need to add that to the json myself? I guess in saying that it would make sense if simpleClient already had an example that includes the sid.

On Tue, Dec 18, 2018, 23:21 Rohan <notifications@github.com wrote:

Sorry for the late reply. 'sid' is in the JSON request in simpleClient.py. For first request from a particular client send 'sid'= ' ' in the JSON payload. For subsequent requests put the Session ID returned by simpleServer.py in the JSON.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448393443, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qerxYxnjbUE8RBO3yRT94QBkvXsJks5u6WpfgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Yes, after the Session ID is generated and sent back to the Client you need to add it yourself in the second request. From the second request onwards till budget is exceeded it will remain unchanged i.e., the Server will always take the sid sent in the second request. This way even if someone tries to change the sid it won't work. When your budget is exceeded you need to set it to Null again to indicate you want to start a new session.

yoid2000 commented 5 years ago

I know all that too. My point is that the code in simpleClient.py should already have the sid in it, so that it will be clear to someone who wants to use it. In fact, it would be good if simpleClient.py contained two queries, one that starts with a null sid, then reads the sid from the response and puts this sid in the next query. That would make a more complete example. That's all I'm saying.

On Wed, Dec 19, 2018 at 9:25 AM Rohan notifications@github.com wrote:

Yes, after the Session ID is generated and sent back to the Client you need to add it yourself in the second request. From the second request onwards till budget is exceeded it will remain unchanged i.e., the Server will always take the sid sent in the second request. This way even if someone tries to change the sid it won't work. When your budget is exceeded you need to set it to Null again to indicate you want to start a new session.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448509360, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qV16qdsakvXOvWOI6sl6pUe-2ZWJks5u6ff-gaJpZM4YUiXl .

rbh-93 commented 5 years ago

Then what I can do is make two separate JSON payloads (data1 and data2) in the Client. One with sid set as Null and in the other sid automatucally set to incoming Session ID generated by the Server. During the first request the user sends data1 and for subsequent requests data2. This would be clearer as an example. Will this be alright?

yoid2000 commented 5 years ago

That'd be great. Thanks.

On Wed, Dec 19, 2018 at 12:26 PM Rohan notifications@github.com wrote:

Then what I can do is make two separate JSON payloads (data1 and data2) in the Client. One with sid set as Null and in the other sid automatucally set to incoming Session ID generated by the Server. During the first request the user sends data1 and for subsequent requests data2. This would be clearer as an example. Will this be alright?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448561727, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qTkEUcLnIJnznx0qeL9-tFiT-bA4ks5u6iJFgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Please check. I have pushed the code and updated the README.md

yoid2000 commented 5 years ago

I'm unable to connect. for the URL, I'm using this:

url = 'http://db001.gda-score.org:5890/data'

Is the server running on db001.gda-score.org ?

On Wed, Dec 19, 2018 at 5:52 PM Rohan notifications@github.com wrote:

Please check. I have pushed the code and updated the README.md

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448665782, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qaJg9qk4ksrix-Xt31HeF0GjVoqiks5u6m6zgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Do you mean the connection from the Uber Tool (ElasticSenstivityExample.scala/QueryRewritingExample.scala) to the DB? If it is the connection from simpleClient.py to simpleServer.py then you need to give url = 'http://127.0.0.1:5890/data assuming you are running simpleServer.py on the same machine i.e., the localhost.

yoid2000 commented 5 years ago

I'm not sure what I mean. My expectation is that the uber process (I guess simpleServer.py) is running on the server (db001.gda-score.org), and simpleQuery.py can be used as-is to query the uber process on db001. Is this not the case?

PF

On Thu, Dec 20, 2018 at 11:31 AM Rohan notifications@github.com wrote:

Do you mean the connection from the Uber Tool (ElasticSenstivityExample.scala/QueryRewritingExample.scala) to the DB?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448949976, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qc6tsexFI5jZSQCBAWK3deGajlb-ks5u62cJgaJpZM4YUiXl .

yoid2000 commented 5 years ago

Yes, so I'm assuming that everything to the right of simpleServer.py is running on db001.gda-score.org. If this is not the case, could you please set it up that way?

On Thu, Dec 20, 2018 at 11:52 AM Rohan notifications@github.com wrote:

[image: dataflow] https://user-images.githubusercontent.com/30060044/50280464-e525dc00-044c-11e9-9eaa-55b1e206c387.jpg This is an old diagram which I had uploaded in another issue (now closed). It doesn't include the Session ID protocol dataflwo but the general idea is the same. simpleClient.py sends a JSON to simpleServer.py which creates a JSON file with the data sent to it. The Uber tool reads this JSON file from the directory (ClientJson). The Uber Tool extracts the query, connects to db001.gda-score.org and runs the query on the DB. The actual and noisy result is written to a result.txt file. simpleServer.py reads this file and sends the Noisy Result back to simpleClient.py along with the response code.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448955171, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qRfTdk9-kGvvXQg1O01elmRaHZTfks5u62vJgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Currently the Databases are on db001.gda-score.org. If you want the Uber Tool to run on db001.gda-score.org then I guess it has to be packaged and deployed on that server. If you are free can I go over to MPI now?

yoid2000 commented 5 years ago

I won't be free until 2:30 or so. But definitely the uber tool is supposed to run on some public machine, and it may as well be db001.gda-score.org. That is the whole point of this effort.

On Thu, Dec 20, 2018 at 12:10 PM Rohan notifications@github.com wrote:

Currently the Databases are on db001.gda-score.org. If you want the Uber Tool to run on db001.gda-score.org then I guess it has to be packaged and deployed on that server. If you are free can I go over to MPI now?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-448960224, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qbME6bi-mACzWRJbYtHxi099x5Vmks5u63AbgaJpZM4YUiXl .

rbh-93 commented 5 years ago

Okay. I will go after 2:30 P.M.

rbh-93 commented 5 years ago

I won't be free until 2:30 or so. But definitely the uber tool is supposed to run on some public machine, and it may as well be db001.gda-score.org. That is the whole point of this effort. On Thu, Dec 20, 2018 at 12:10 PM Rohan @.***> wrote: Currently the Databases are on db001.gda-score.org. If you want the Uber Tool to run on db001.gda-score.org then I guess it has to be packaged and deployed on that server. If you are free can I go over to MPI now? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qbME6bi-mACzWRJbYtHxi099x5Vmks5u63AbgaJpZM4YUiXl .

I wanted to clarify one thing before deploying on the server. Do you want the both UberTool and simpleServer.py or only UberTool to be deployed on db001.gda-score.org?

yoid2000 commented 5 years ago

simpleServer must be on db001 as well. Everything but the client should be on db001.

PF

On Thu, Jan 3, 2019 at 6:09 PM Rohan notifications@github.com wrote:

I won't be free until 2:30 or so. But definitely the uber tool is supposed to run on some public machine, and it may as well be db001.gda-score.org. That is the whole point of this effort. … <#m-2841644840003319547> On Thu, Dec 20, 2018 at 12:10 PM Rohan @.***> wrote: Currently the Databases are on db001.gda-score.org. If you want the Uber Tool to run on db001.gda-score.org then I guess it has to be packaged and deployed on that server. If you are free can I go over to MPI now? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23 (comment) https://github.com/gda-score/code/issues/23#issuecomment-448960224>, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qbME6bi-mACzWRJbYtHxi099x5Vmks5u63AbgaJpZM4YUiXl .

I wanted to clarify one thing before deploying on the server. Do you want the both UberTool and simpleServer.py or only UberTool to be deployed on db001.gda-score.org?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/23#issuecomment-451210109, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qQDqBc1OdDj_LRBw3IVv0L3ti4URks5u_jk-gaJpZM4YUiXl .

yoid2000 commented 5 years ago

I'd like to leave this open until we have a full demonstration of the client working with the server on db001. In the meantime I'll make another task for you.

yoid2000 commented 4 years ago

finished