aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Mask creds in docker compose file for local mode #2118

Closed jmahlik closed 3 years ago

jmahlik commented 3 years ago

Describe the feature you'd like Currently, line 673 in sagemaker/local/image.py logs AWS creds in the docker compose file. This could cause a credentials leak if it is running in a ci pipeline, during presentations/demos and for live streamers or youtubers creating sagemaker content.

I don't feel this is a outright security issue, just unexpected logging that could have adverse impact in certain situations.

One has to explicitly provide fake creds and local data to get around this. This limits the ability to use data from s3 and use other aws services in jobs.

How would this feature be used? Please describe. Credentials should be masked in the logs to prevent a possible leak. This could be accomplished with a regex replace of the yaml content. I'm happy to submit a PR for this, but wanted to get thoughts because there may be better alternatives to regex. I'm not a fan of regex but it was the best I could come up with since the container needs the creds.

Describe alternatives you've considered Currently patching out this logging in a ci pipeline.

ajaykarpur commented 3 years ago

Credentials should be masked in the logs to prevent a possible leak. This could be accomplished with a regex replace of the yaml content. I'm happy to submit a PR for this, but wanted to get thoughts because there may be better alternatives to regex. I'm not a fan of regex but it was the best I could come up with since the container needs the creds.

Thanks for the suggestion! Masking the credentials definitely makes sense. It may be possible to use PyYAML for this purpose instead of a regex-- I'd recommend looking into this. Please feel free to submit a PR and we'll be happy to review.

jmahlik commented 3 years ago

Credentials should be masked in the logs to prevent a possible leak. This could be accomplished with a regex replace of the yaml content. I'm happy to submit a PR for this, but wanted to get thoughts because there may be better alternatives to regex. I'm not a fan of regex but it was the best I could come up with since the container needs the creds.

Thanks for the suggestion! Masking the credentials definitely makes sense. It may be possible to use PyYAML for this purpose instead of a regex-- I'd recommend looking into this. Please feel free to submit a PR and we'll be happy to review.

Excellent suggestion, that is way better. I'll take a stab at it over the weekend.

jmahlik commented 3 years ago

Think I have something working. The environment turned out to be a list of strings so the simple solution was to set each item in the "environment" list to "[Masked]" for all services (on a separate deepcopy for logging).

One question: where is logging configured for local mode/test suite? I can't seem to track it down. I'm only getting WARNING level logging when running the test suite. When I run jobs in local mode, it returns INFO level logging. Makes using caplog to check the INFO logs a bit hard in the tests.

Any thoughts appreciated. Will submit PR once I can get assert on the info logs.