New Sanitizer not being properly applied to AML SDK e2e tests

MilesHolland commented 1 month ago

azure-ai-ml:
dev:
windows 11:
3.10.8:

Describe the bug

There seem to be 2 separate sanitizers in use for AML SDK tests, one when recording tests, and one when playing them back. These sanitizers do not align, which causes failures in playback mode when the recordings do not match expected values. As an example, I ran a test today (6/5/2024), had it run successfully, pushed the recording, then re-ran the test in playback mode and got the following failure:

Unable to find a record for the request PUT https://Sanitized.azure.com/subscriptions/00000000-0000-0000-0000-000000000/resourcegroups/00000/providers/Microsoft.Resources/deployments/e2etest_test_863132474581-7664666?api-version=2020-06-01 Uri doesn't match: request <https://Sanitized.azure.com/subscriptions/00000000-0000-0000-0000-000000000/resourcegroups/00000/providers/Microsoft.Resources/deployments/e2etest_test_863132474581-7664666?api-version=2020-06-01> record <https://Sanitized.azure.com/subscriptions/<actual subscription id>/resourcegroups/<actual RG name>/providers/Microsoft.Resources/deployments/e2etest_test_863132474581-7664666?api-version=2020-06-01> Header differences: Body differences: Request and record bodies do not match at index 31568: request: "{"value":"00000"},"description" record: "{"value":"<actual RG name>"},"

To Reproduce Steps to reproduce the behavior:

Setup AML SDK test infra, making sure that a default subscription ID and workspace are set.
CD to the following directory with the azure-sdk-for-python repo: sdk/ml/azure-ai-ml/
Set e2e tests to live mode and run the following test: 'pytest tests/component/ -k test_automl_component`
CD to the top directory of the repo and push the new recording of this test with python ./scripts/manage_recordings.py push -p sdk/ml/azure-ai-ml/assets.json
Set e2e tests to playback mode and rerun the test.

Expected behavior I expect the test when run in playback mode with the new recording to simply pass, assuming the live test passed. I do not expect the sanitizers to fail for seemingly refusing to perform their primary purpose.

Screenshots Screenshots not provided to avoid including private info.

Additional context n/a

mccoyp commented 3 weeks ago

Hi @MilesHolland, thank you for the report. My first thought is that the tests could be missing a sanitizer that scrubs these values during recording, since you're seeing false values in playback requests but real values in recordings. This usually indicates that false values are correctly being provided to tests in playback, but that a sanitizer needs to be added that aligns with these values.

It's also worth ensuring that your branch is up to date with the latest main branch so that your tests use the latest version of the test proxy.

Since there have been central changes to sanitizing recently, it's also possible that this is caused by something outside of ML's setup. I'll try to reproduce today and get back to you as soon as possible.

mccoyp commented 3 weeks ago

I was able to reproduce this issue, and it does look like test proxy behavior is causing the failure. As you may have noticed, this test was already identified as problematic while adapting ML tests to the recent sanitizer overhaul; the test is marked as live-only in main to avoid this playback error: https://github.com/Azure/azure-sdk-for-python/blob/a61a8e2a934c75648a974b458a67e954436b8011/sdk/ml/azure-ai-ml/tests/component/e2etests/test_component.py#L209-L210

As for what's actually causing the failure, it looks like the test proxy is failing to apply some sanitizers during recording. I added a breakpoint to the start of the test and confirmed that sanitizers are being registered for the subscription ID, resource group, etc. that are still showing up in recordings. I confirmed at https://localhost:5001/Info/Active that the the test proxy had string sanitizers registered for each of these values as expected, but after recording, the sanitizers evidently weren't being applied despite this.

The only sanitizer that seems to be applying to URIs in this test's recording is AZSDK4001, a common sanitizer that's applied universally by the test proxy to replace host names with Sanitized. My thought was that having multiple sanitizers targeting the same URI could have been causing a conflict, but removing the sanitizer still didn't cause the ML sanitizers to apply correctly.

I'll have to continue investigating with @scbedd to sort out how this is happening, but for the time being I would recommend keeping the test as live-only so it doesn't cause playback failures.

Azure / azure-sdk-for-python

New Sanitizer not being properly applied to AML SDK e2e tests #35930