gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

remove passwords from code package #53

Closed yoid2000 closed 4 years ago

yoid2000 commented 4 years ago

Currently we distribute usernames and passwords in the gda-score package.

Even though these are publicly known values, this is overall a bad practice.

Please modify the code and the package so that the username and password are expected in the following env variables:

GDA_SCORE_DIFFIX_USER GDA_SCORE_DIFFIX_PASS GDA_SCORE_DIFFIX_HOST GDA_SCORE_RAW_USER GDA_SCORE_RAW_PASS GDA_SCORE_RAW_HOST

yoid2000 commented 4 years ago

@frzmohammadali let me know if you have questions

frzmohammadali commented 4 years ago

Does that mean we eventually going to get rid of myCredentials.json and the first part of master.json which describe services?

yoid2000 commented 4 years ago

Yes this will replace myCredentials.json.

Let's leave master.json the same. In other words, we won't need GDA_SCORE_DIFFIX_HOST or GDA_SCORE_RAW_HOST. That information can stay in the services part of master.json.

frzmohammadali commented 4 years ago

this is done. here I put some codes which may look controversial to get your approval before updating the package: (changes are in development branch for now)

tested with both attacks and utility example scripts.

frzmohammadali commented 4 years ago

@yoid2000 have you found some time to review my last comment?

yoid2000 commented 4 years ago

Hi Ali,

My concern here is that it is possible that someone wants to run the code without using Aircloak (i.e. on some other anonymization scheme).

In that case, it shouldn't be necessary for them to have environ variables for Aircloak, and therefore the check in setupGdaAttackParameters(...) should not take place.

Is it not possible to raise the error in getDatabaseInfo(...) at the point where you try to read them?

PF

On Fri, May 8, 2020 at 10:32 PM Mohammadali Forouzesh < notifications@github.com> wrote:

this is done. here I put some codes which may look controversial to get your approval before updating the package: (changes are in development branch for now)

  • in gdaTools.py > getDatabaseInfo(...) :

... if theDb['type'] == "postgres": theDb['password'] = os.environ.get("GDA_SCORE_RAW_PASS") theDb['user'] = os.environ.get("GDA_SCORE_RAW_USER")

elif theDb['type'] == "aircloak":
    theDb['password'] = os.environ.get("GDA_SCORE_DIFFIX_PASS")
    theDb['user'] = os.environ.get("GDA_SCORE_DIFFIX_USER")

else:
    raise ValueError("[invalid dbtype value] type of database in .json conf file should be "
                     "either 'postgres' or 'aircloak'.")

assert theDb["user"] and theDb["password"], "db user and password has not been set."...
  • then there should be a point in the execution flow to check for existence of environment variables, I chose setupGdaAttackParameters(...) because it's basically the first thing that is called in both attacks and utility scripts in my opinion:

... userDIFFIX, passDIFFIX, userRAW, passRAW = os.environ.get("GDA_SCORE_DIFFIX_USER"), \ os.environ.get("GDA_SCORE_DIFFIX_PASS"), \ os.environ.get("GDA_SCORE_RAW_USER"), \ os.environ.get("GDA_SCORE_RAW_PASS")if not (userDIFFIX and passDIFFIX and userRAW and passRAW): raise ValueError("database user and password environment variables not found. " "check out Readme.md to see how to set them.")...

tested with both attacks and utility example scripts.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-626007647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KNMJ5WAARTEX7NFKE3RQRT5LANCNFSM4M2LQOEA .

frzmohammadali commented 4 years ago

if I raise error in getDatabaseInfo(..) it will be raised by a background thread cause the function it being called by a background thread so the main program won't exit but only that thread and it doesn't look nice. but that's not an issue, what I can do is to show a warning and call sys.exit() afterwards to cause the program to exit. setupGdaAttackParameters(...) is the entry point and no background thread has been initiated at that point so raising error would just simply exit the whole program.

but then here is the catch: what should happen if they try to run the code without using Aircloak? reading myCredentials.json the old way?

if the answer is yes, in getDatabaseInfo(...) I will check for db type, if it was Aircloak or postgres I'll attempt to retrieve the user and password from environment variables and if not exist I'll print a warning and call sys.exit() to exit the whole program. otherwise for other db types getDatabaseInfo(...) will read myCredentials.json for user and password.

yoid2000 commented 4 years ago

I haven't looked at the code recently, so I'm not sure of this, but is there some point at which the software on one hand knows what services are needed, but on the other hand hasn't yet made threads? This would be an ideal place to check for whether Aircloak credentials are needed or not.

If they try to run the code without Aircloak, nothing special should happen. Whatever code makes database connections will not try to connect to Aircloak and everything should be fine.

PF

On Tue, May 12, 2020 at 12:51 PM Mohammadali Forouzesh < notifications@github.com> wrote:

if I raise error in getDatabaseInfo(..) it will be raised by a background thread cause the function it being called by a background thread so the main program won't exit but only that thread and it doesn't look nice. but that's not an issue, what I can do is to show a warning and call sys.exit() afterwards to cause the program to exit. setupGdaAttackParameters(...) is the entry point and no background thread has been initiated at that point so raising error would just simply exit the whole program.

but then here is the catch: what should happen if they try to run the code without using Aircloak? reading myCredentials.json the old way?

if the answer is yes, in getDatabaseInfo(...) I will check for db type, if it was Aircloak or postgres I'll attempt to retrieve the user and password from environment variables and if not exist I'll print a warning and call sys.exit() to exit the whole program. otherwise for other db types getDatabaseInfo(...) will read myCredentials.json for user and password.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-627266321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KK7T4NPUVUREIEHXOTRRESZHANCNFSM4M2LQOEA .

frzmohammadali commented 4 years ago

OK. So there is this internal methodgdaAttack._setupThreadsAndQueues(...) where the threads get created and this method is called from gdaAttack.__init__(...) so either of them is the point right before thread creation and by then the program should have the params received via gdaAttack.__init__(...) so required services should be decidable at that point but let me try that and I'll inform you.

yoid2000 commented 4 years ago

That sounds right.

In which case you would need to do the same for gdaUtility() probably

PF

On Tue, May 12, 2020 at 3:26 PM Mohammadali Forouzesh < notifications@github.com> wrote:

OK. So there is this internal methodgdaAttack._setupThreadsAndQueues(...) where the threads get created and this method is called from gdaAttack.init(...) so either of them is the point right before thread creation and by then the program should have the params received via gdaAttack.init(...) so required services should be decidable at that point but let me try that and I'll inform you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-627344023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KI2B2CDO3QFCVSQFO3RRFFBFANCNFSM4M2LQOEA .

frzmohammadali commented 4 years ago

so, there is also another internal method gdaAttack._doParamChecks(...) which is called before thread creation and I think it also semantically fit the purpose of this checking. this one actually calls getDatabaseInfo(...) so I put the check there and call sys.exit(msg) whenever env var needed but not exist.

if you remeber we also have this gdascore_init script in the package which was made for creating config folder and propagating default credentials. so I updated that one as well to not only archive things related to myCredentials.json but also now it can even set those environment variables automatically so no need to go through hassle of adding env vars for user. I hope you are in favor of that.

for gdaUtility, since it also leverages gdaAttack interface in the beginning of its process, there is no need to changed much stuff there.

doing the task, I noticed some improvement that could be done to thread management and background threads termination which essentially was related to queue.get() blocking some threads that don't have data in their queue maybe in the beginning which also could stop them from terminating at the moment. so I put a timeout for that and that process should now work smother as well.

package is updated on pip to version 2.3.5

therefore I think this issue is done if you approve all mentioned above.

yoid2000 commented 4 years ago

Sounds good to me.

I'll try it out later today.

PF

On Tue, May 12, 2020 at 10:32 PM Mohammadali Forouzesh < notifications@github.com> wrote:

so, there is also another internal method gdaAttack._doParamChecks(...) which is called before thread creation and I think it also semantically fit the purpose of this checking. this one actually calls getDatabaseInfo(...) so I put the check there and call sys.exit(msg) whenever env var needed but not exist.

if you remeber we also have this gdascore_init script in the package which was made for creating config folder and propagating default credentials. so I updated that one as well to not only archive things related to myCredentials.json but also now it can even set those environment variables automatically so no need to go through hassle of adding env vars for user. I hope you are in favor of that.

for gdaUtility, since it also leverages gdaAttack interface in the beginning of its process, there is no need to changed much stuff there.

doing the task, I noticed some improvement that could be done to thread management and background threads termination which essentially was related to queue.get() blocking some threads that don't have data in their queue maybe in the beginning which also could stop them from terminating at the moment. so I put a timeout for that and that process should now work smother as well.

package is updated on pip to version 2.3.5

therefore I think this issue is done if you approve all mentioned above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-627577489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KLFCXXFXLL5RVPEU3TRRGW5VANCNFSM4M2LQOEA .

yoid2000 commented 4 years ago

So I did pip install gda-score-code.

My expectation was that this would install the new code that expects to find the env variables, and that running an attack for instance would fail until I set those env variables.

But that is not what happened. The code still works...

Why is this?

PF

ps. I updated the README.md. You had the credentials listed there so I removed them, which was the whole point of this exercise.

On Tue, May 12, 2020 at 10:32 PM Mohammadali Forouzesh < notifications@github.com> wrote:

so, there is also another internal method gdaAttack._doParamChecks(...) which is called before thread creation and I think it also semantically fit the purpose of this checking. this one actually calls getDatabaseInfo(...) so I put the check there and call sys.exit(msg) whenever env var needed but not exist.

if you remeber we also have this gdascore_init script in the package which was made for creating config folder and propagating default credentials. so I updated that one as well to not only archive things related to myCredentials.json but also now it can even set those environment variables automatically so no need to go through hassle of adding env vars for user. I hope you are in favor of that.

for gdaUtility, since it also leverages gdaAttack interface in the beginning of its process, there is no need to changed much stuff there.

doing the task, I noticed some improvement that could be done to thread management and background threads termination which essentially was related to queue.get() blocking some threads that don't have data in their queue maybe in the beginning which also could stop them from terminating at the moment. so I put a timeout for that and that process should now work smother as well.

package is updated on pip to version 2.3.5

therefore I think this issue is done if you approve all mentioned above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-627577489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KLFCXXFXLL5RVPEU3TRRGW5VANCNFSM4M2LQOEA .

frzmohammadali commented 4 years ago

Oh I think you forgot to put "-U" switch in the command to update the package, in this case pip won't update. Have you also tried "pip install -U gda-score-code"? If no, please give it a try. Make sure you are on version 2.3.5 of the gda-score-code package and if still you see no env var effect let me know.Regards,Mohammadali On May 14, 2020 9:09 AM, Paul Francis notifications@github.com wrote:

So I did pip install gda-score-code.

My expectation was that this would install the new code that expects to

find the env variables, and that running an attack for instance would fail

until I set those env variables.

But that is not what happened. The code still works...

Why is this?

PF

ps. I updated the README.md. You had the credentials listed there so I

removed them, which was the whole point of this exercise.

On Tue, May 12, 2020 at 10:32 PM Mohammadali Forouzesh <

notifications@github.com> wrote:

so, there is also another internal method gdaAttack._doParamChecks(...)

which is called before thread creation and I think it also semantically fit

the purpose of this checking. this one actually calls getDatabaseInfo(...)

so I put the check there and call sys.exit(msg) whenever env var needed

but not exist.

if you remeber we also have this gdascore_init script in the package

which was made for creating config folder and propagating default

credentials. so I updated that one as well to not only archive things

related to myCredentials.json but also now it can even set those

environment variables automatically so no need to go through hassle of

adding env vars for user. I hope you are in favor of that.

for gdaUtility, since it also leverages gdaAttack interface in the

beginning of its process, there is no need to changed much stuff there.

doing the task, I noticed some improvement that could be done to thread

management and background threads termination which essentially was related

to queue.get() blocking some threads that don't have data in their queue

maybe in the beginning which also could stop them from terminating at the

moment. so I put a timeout for that and that process should now work

smother as well.

package is updated on pip to version 2.3.5

therefore I think this issue is done if you approve all mentioned above.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub

https://github.com/gda-score/code/issues/53#issuecomment-627577489, or

unsubscribe

https://github.com/notifications/unsubscribe-auth/AAQP5KLFCXXFXLL5RVPEU3TRRGW5VANCNFSM4M2LQOEA

.

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

yoid2000 commented 4 years ago

ok, I updated properly with pip.

Now I get the following error:

Traceback (most recent call last):
  File "bountySinglingOut.py", line 129, in <module>
    paramsList = setupGdaAttackParameters(config)
  File "C:\Users\francis\Miniconda3\lib\site-packages\gdascore\gdaTools.py", line 274, in setupGdaAttackParameters
    master = getMasterConfig()
  File "C:\Users\francis\Miniconda3\lib\site-packages\gdascore\gdaTools.py", line 122, in getMasterConfig
    path = try_for_config_file(os.path.join("common", "config", "master.json"))
  File "C:\Users\francis\Miniconda3\lib\site-packages\gdascore\gdaTools.py", line 23, in try_for_config_file
    with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'global_config', 'config_var.json'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\francis\\Miniconda3\\lib\\site-packages\\gdascore\\global_config\\config_var.json'
frzmohammadali commented 4 years ago

so I shipped a new update, current version is 2.3.6

bug you mentioned is fixed, then I noticed some issue with cache thread blocking the code form termination after attack is finished so attempted to fixed that one too. and small features such as colored messages in the terminal are added. would be great if you could try this one out and see if you face any issue (in my case I was able to run attack scripts except for bountySinglingOut.py which throws a Division by Zero exception which seems to be quite irrelevant to these changes).

things under work are now gdaUtility and a subtle bug that cacheDB file doesn't get removed properly when interrupting the script in the middle (otherwise it does). I've already found some clues on that and will send you another update in that regard soon.

one thing that I've learned through the past week is that maybe it's better to always use a timeout when calling queue.get() cause it can unintentionally block the execution if something goes wrong and the queue become empty forever. in which case debugging it is also not simple. so i would use this piece of code whenever working with queues:

while True:
    try:
        res = q.get(block=True, timeout=5)  #  this code waits maximum for 5 seconds for an item to arrive at the queue, otherwise raise empty queue exception
    except queue.Empty:
        # keep track of how long the queue has been empty and decide whether to try again or stop waiting
        {continue | break}
yoid2000 commented 4 years ago

Working now. Thanks! You can close this.

PF

On Mon, May 18, 2020 at 6:46 AM Mohammadali Forouzesh < notifications@github.com> wrote:

so I shipped a new update, current version is 2.3.6

bug you mentioned is fixed, then I noticed some issue with cache thread blocking the code form termination after attack is finished so attempted to fixed that one too. and small features such as colored messages in the terminal are added. would be great if you could try this one out and see if you face any issue (in my case I was able to run attack scripts except for bountySinglingOut.py which throws a Division by Zero exception which seems to be quite irrelevant to these changes).

things under work are now gdaUtility and a subtle bug that cacheDB file doesn't get removed properly when interrupting the script in the middle (otherwise it does). I've already found some clues on that and will send you another update in that regard soon.

one thing that I've learned through the past week is that maybe it's better to always use a timeout when calling queue.get() cause it can unintentionally block the execution if something goes wrong and the queue become empty forever. in which case debugging it is also not simple. so i would use this piece of code whenever working with queues:

while True: try: res = q.get(block=True, timeout=5) # this code waits maximum for 5 seconds for an item to arrive at the queue, otherwise raise empty queue exception except queue.Empty:

keep track of how long the queue has been empty and decide whether to try again or stop waiting

    {continue | break}

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/53#issuecomment-629942914, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP5KPDDFV3D7FMOV7HBXTRSC4QPANCNFSM4M2LQOEA .