Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.61k stars 2.82k forks source link

[engsys] Batch Sanitizers have the potential to mangle JSON responses and make them invalid #35395

Closed kdestin closed 6 months ago

kdestin commented 6 months ago

Describe the bug

35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.

The sanitization is Regex based and not "file format aware". Some of the sanitizers as defined have the potential to leave responses in a corrupt state (e.g. as invalid json).

The sanitizers for query strings were the source of my issue, but this might be an issue with other sanitizers too: https://github.com/Azure/azure-sdk-for-python/blob/434a248f351284ac839a25f6ba582cbbb999b45a/tools/azure-sdk-tools/devtools_testutils/proxy_startup.py#L394-L400

To Reproduce Steps to reproduce the behavior:

  1. Write a test that receives an HTTP response which'll get sanitized. Example (&sig= query param)
{
    "secretsType": "Sas",
    "sasToken": "sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
  1. Observe that your test passes when run in live mode (with the following environment variable set AZURE_TEST_RUN_LIVE=true).

  2. Try to replay the test in recording mode (with the following environment variable set AZURE_TEST_RUN_LIVE=false).

Expected behavior

The test should pass, using the sanitized responses from the record.

Actual Behavior

This sanitizer mangles the JSON response:

https://github.com/Azure/azure-sdk-for-python/blob/434a248f351284ac839a25f6ba582cbbb999b45a/tools/azure-sdk-tools/devtools_testutils/proxy_startup.py#L397

            except ValueError as err:
>               raise DecodeError(
                    message="JSON is invalid: {}".format(err),
                    response=response,
                    error=err,
E                   azure.core.exceptions.DecodeError: JSON is invalid: Invalid control character at: line 3 column 78 (char 103)
E                   Content: {
E                     "secretsType": "Sas",
E                     "sasToken": "sv=2021-10-04&si=azureml-system-datastore-policy&sr=cSanitized
E                   }

../../../../venv/lib/python3.8/site-packages/azure/core/pipeline/policies/_universal.py:616: DecodeError

Screenshots If applicable, add screenshots to help explain your problem.

Additional context

ML team has a regex for query parameters that might be helpful in finding a fix

https://github.com/Azure/azure-sdk-for-python/blob/434a248f351284ac839a25f6ba582cbbb999b45a/sdk/ml/azure-ai-ml/tests/conftest.py#L72-L87

xiangyan99 commented 6 months ago

@mccoyp can you take a look?

kdestin commented 6 months ago

Resolved by #35419