Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.53k stars 2.76k forks source link

[engsys] Global Sanitizers inconsistently sanitize storage account names, recordings unreplayable #35447

Closed kdestin closed 4 months ago

kdestin commented 4 months ago

Describe the bug

https://github.com/Azure/azure-sdk-for-python/pull/35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.

I'm currently writing a test, where the code path involves:

  1. Fetching details about a storage account

  2. Usage those details to build the uri for the next request

This sanitizer will redact the storage account name in the recording from the response in Step 1.

https://github.com/Azure/azure-sdk-for-python/blob/511aef315bf6919f52c90adb1803a3b9079cbb05/tools/azure-sdk-tools/devtools_testutils/proxy_startup.py#L379

There is no "global" sanitizer that sanitizes storage account names from request urls.

This leaves my recording un-replayable.

In recording mode, the code receives the sanitized request and tries to send a subsequent request to a URL it builds with the sanitized values: https://sanitized.blob.core.windows.net. But the recording stored an unsanitized URL for that subsequent request, https://account-name.blob.core.windows.net, so the proxy is unable to find a match.

To Reproduce Steps to reproduce the behavior:

  1. Succesfully record a test in live mode that:

    1. Fetches some response with details about a storage account
    // Example response
    {
            "id": "/subscriptions/00000000-0000-0000-0000-000000000/resourceGroups/00000/providers/Microsoft.MachineLearningServices/workspaces/00000/datastores/workspaceblobstore",
            "name": "workspaceblobstore",
            "type": "Microsoft.MachineLearningServices/workspaces/datastores",
            "properties": {
              ...,
              "subscriptionId": "00000000-0000-0000-0000-000000000",
              "resourceGroup": "resource-group",
              "datastoreType": "AzureBlob",
              "accountName": "account-name",
              "containerName": "d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore",
              "endpoint": "core.windows.net",
              "protocol": "https",
              "serviceDataAccessAuthIdentity": "WorkspaceSystemAssignedIdentity"
            },
            "systemData": {
                ...
            }
    
    }
    1. Uses that response to build the URL for a subsequent request

    https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/path/to/files

  2. Attempt to re-run the test in recording mode

Expected behavior

The test should run off the recording, and pass

Actual behavior

The test fails

ERROR    root:proxy_fixtures.py:312 

-----Test proxy playback error:-----

Unable to find a record for the request PUT https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized
Method doesn't match, request <PUT> record <HEAD>
Uri doesn't match:
    request <https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>
    record  <https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

mccoyp commented 4 months ago

Hi @kdestin, I have good news: Paul merged an update to our tooling that adds the ability to remove sanitizers (https://github.com/Azure/azure-sdk-for-python/pull/35385), so there should now be a better solution than your current workaround.

Some sanitizers were already disabled for ML as part of the update: https://github.com/Azure/azure-sdk-for-python/blob/a61a8e2a934c75648a974b458a67e954436b8011/sdk/ml/azure-ai-ml/tests/conftest.py#L134-L138

If you do the same with the accountName body key sanitizer, you should be able to remove the additional sanitizer you added as a workaround. The ID for the central sanitizer is "AZSDK3478"; here's where it's registered by the test proxy.

kdestin commented 4 months ago

Awesome, thank you so much. I'll close this issue