Open mattsp1290 opened 1 year ago
The issue is reported in Python SDK but SDK is just tool to "send / receive" http "request / response" "from / to" service. According to log, the issue is caused by JSON is invalid: Expecting property name enclosed in double quotes:
. It is common issue in json string decode: https://stackoverflow.com/questions/39491420/python-jsonexpecting-property-name-enclosed-in-double-quotes. I infer the reason is service returns json string which contains '
so hope service team could help on this issue.
Adding service team to look into this.
The issue seems to be that the JSON ends with "name": {
meaning it is opening a new object but not populating it. The response from the service simply cuts off.
We are currently assessing this to see where the problem is coming from. I will keep this issue updated with the status.
Hi @mattsp1290, according to log, we know the exception happens in https://github.com/Azure/azure-sdk-for-python/blob/c82f016de68ccfa49b2234f6c4258db5b32baaa9/sdk/core/azure-core/azure/core/pipeline/policies/_universal.py#L613-L621. If we could know value of data_as_str
and data
, the root cause could be found. In view of that SDK team can't reproduce the error, could you add try-catch logic to catch more log info?
from azure.core.pipeline.policies import ContentDecodePolicy
from azure.core.exceptions import DecodeError
import logging
...
try:
result = list(compute_client.usage.list())
except DecodeError as e:
logging.error("DecodeError: %s", e.response.text())
Hi @mattsp1290, according to log, we know the exception happens in https://github.com/Azure/azure-sdk-for-python/blob/c82f016de68ccfa49b2234f6c4258db5b32baaa9/sdk/core/azure-core/azure/core/pipeline/policies/_universal.py#L613-L621. If we could know value of
data_as_str
anddata
, the root cause could be found. In view of that SDK team can't reproduce the error, could you add try-catch logic to catch more log info?
The result of data_as_str should be the same as the string in the original stack trace. The trace itself is not cut off. We have cached this data in the past to verify and it consistently ends around that point.
from azure.core.pipeline.policies import ContentDecodePolicy from azure.core.exceptions import DecodeError import logging ... try: result = list(compute_client.usage.list()) except DecodeError as e: logging.error("DecodeError: %s", e.response.text())
I don't see a convenient way to add this logging without creating a custom built package for azure-mgmt-compute
. Are there options built in to the library to get this request logging happening?
I don't see a convenient way to add this logging without creating a custom built package for azure-mgmt-compute.
-> You just need to embed try/catch
logic into your code to catch the DecodeError
so that we can get the raw response from server when error happens. Or is there other way for us to reproduce the error locally?
I don't see a convenient way to add this logging without creating a custom built package for azure-mgmt-compute.
-> You just need to embedtry/catch
logic into your code to catch theDecodeError
so that we can get the raw response from server when error happens. Or is there other way for us to reproduce the error locally?
We have tried this previously. the results are the same as stack trace attached to this issue
@mattsp1290 Can you open detailed logging switch of SDK logging_enable
like:
from azure.core.pipeline.policies import ContentDecodePolicy
from azure.core.exceptions import DecodeError
import logging
...
try:
result = list(compute_client.usage.list(logging_enable=True))
except DecodeError as e:
logging.error("DecodeError: %s", e.response.text())
After you catch error, please provide the detailed log (better with file) for service team so that they can analyze for root cause. nit: please remember to conceal sensitive info of log. (e.g. token/subscription id/tenant id/secret/...)
@mattsp1290 Can you open detailed logging switch of SDK
logging_enable
like:from azure.core.pipeline.policies import ContentDecodePolicy from azure.core.exceptions import DecodeError import logging ... try: result = list(compute_client.usage.list(logging_enable=True)) except DecodeError as e: logging.error("DecodeError: %s", e.response.text())
After you catch error, please provide the detailed log (better with file) for service team so that they can analyze for root cause. nit: please remember to conceal sensitive info of log. (e.g. token/subscription id/tenant id/secret/...)
We are continuing to investigate how to implement this. There are some challenges where we have purposely lowered Azure SDK logging in the general case and we need to re-enable logging for a handful of places in our code. Will provide another update when we have a chance to address this.
@mattsp1290 Can you open detailed logging switch of SDK
logging_enable
like:from azure.core.pipeline.policies import ContentDecodePolicy from azure.core.exceptions import DecodeError import logging ... try: result = list(compute_client.usage.list(logging_enable=True)) except DecodeError as e: logging.error("DecodeError: %s", e.response.text())
After you catch error, please provide the detailed log (better with file) for service team so that they can analyze for root cause. nit: please remember to conceal sensitive info of log. (e.g. token/subscription id/tenant id/secret/...)
@msyyc Apologies for the long tail on the reply here. Unfortunately we are unable to implement this change as it bleeds too much client information into our productions logs. We do have a custom transport client for our SDK clients and can implement a lot of custom logging there.
Are there specific pieces of information we could provide that would tell you the same things you were. looking for from changing the logging level to a more verbose level?
@lmazuel @iscai-msft Do you have any suggestion on this issue?
Unfortunately we are unable to implement this change as it bleeds too much client information into our productions logs.
I understand your concern @mattsp1290. After offline discussion, could you follow the up logic to provide reaponse header
only? Then service team @uonie may go for next step with header info.
from azure.core.pipeline.policies import ContentDecodePolicy
from azure.core.exceptions import DecodeError
import logging
...
try:
result = list(compute_client.usage.list(logging_enable=True))
except DecodeError as e:
logging.error("DecodeError: %s", e.response.headers)
@msyyc and @uonie I have implemented the above request. Here are some example header responses when we encounter these errors
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32473', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': '0c1a9290-4228-4853-92e3-fa20e3b03a6b', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '14912', 'x-ms-correlation-request-id': '200f3231-184e-44e0-9b18-e9519697386e', 'x-ms-routing-request-id': 'CHINANORTH3:20231213T052026Z:200f3231-184e-44e0-9b18-e9519697386e', 'x-content-type-options': 'nosniff', 'date': 'Wed, 13 Dec 2023 05:20:25 GMT', 'x-envoy-upstream-service-time': '5733'}
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32946', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': '01fc2010-6f34-40a0-8883-bbfc7be859dd', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '11999', 'x-ms-correlation-request-id': 'cdd2758c-e627-4008-8d8b-f4a81cbfadd7', 'x-ms-routing-request-id': 'EASTUS:20231213T020742Z:cdd2758c-e627-4008-8d8b-f4a81cbfadd7', 'x-content-type-options': 'nosniff', 'date': 'Wed, 13 Dec 2023 02:07:41 GMT', 'x-envoy-upstream-service-time': '508'}
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32478', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': 'd5314159-dce4-4171-8b0f-ffaa784f2e23', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '11962', 'x-ms-correlation-request-id': 'd9a19797-f3c2-4d05-a378-2ab2787b3d72', 'x-ms-routing-request-id': 'SOUTHCENTRALUS:20231212T174209Z:d9a19797-f3c2-4d05-a378-2ab2787b3d72', 'x-content-type-options': 'nosniff', 'date': 'Tue, 12 Dec 2023 17:42:08 GMT', 'x-envoy-upstream-service-time': '2737'}
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32478', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': 'c87bdf0b-bf01-4711-ab33-acb18d30aa95', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '11969', 'x-ms-correlation-request-id': '5288d19c-cc14-4e0c-b43f-3ac522433751', 'x-ms-routing-request-id': 'SOUTHCENTRALUS:20231212T173839Z:5288d19c-cc14-4e0c-b43f-3ac522433751', 'x-content-type-options': 'nosniff', 'date': 'Tue, 12 Dec 2023 17:38:38 GMT', 'x-envoy-upstream-service-time': '508'}
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32478', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': 'a1f40c7d-c27d-4c3a-9fc3-e85213223440', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '11969', 'x-ms-correlation-request-id': 'd47f2f64-9d48-43e8-bfb3-70c1c7f9eef9', 'x-ms-routing-request-id': 'SOUTHCENTRALUS:20231212T173414Z:d47f2f64-9d48-43e8-bfb3-70c1c7f9eef9', 'x-content-type-options': 'nosniff', 'date': 'Tue, 12 Dec 2023 17:34:13 GMT', 'x-envoy-upstream-service-time': '2810'}
In offline discussion, @uonie from service team said they can't store body data because of private policy. So the issue still depends on users report the body content when error happens. @mattsp1290 as I claimed in https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-1809411528, I understand your concern. Is it possible to record the body content in a safe and private place which is not exposed? like:
from azure.core.pipeline.policies import ContentDecodePolicy
from azure.core.exceptions import DecodeError
import logging
...
try:
result = list(compute_client.usage.list(logging_enable=True))
except DecodeError as e:
with open("../safe_private_file.txt", "a+") as file:
file.write(f"error body: {e.response.text()}")
@msyyc and @uonie I have implemented your request. It gives back logs similar to what is in the original body of this issue.
Here are the headers we get back
{'cache-control': 'no-cache', 'pragma': 'no-cache', 'content-length': '32479', 'content-type': 'application/json; charset=utf-8', 'expires': '-1', 'x-ms-ratelimit-remaining-resource': 'Microsoft.Compute/GetSubscriptionInfo3Min;359', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'x-ms-request-id': '1b1e44e1-b350-4938-937b-cb355d12df4b', 'server': 'envoy', 'x-ms-ratelimit-remaining-subscription-reads': '11996', 'x-ms-correlation-request-id': '674ad3ac-2731-43de-a709-d179bcdbc08b', 'x-ms-routing-request-id': 'SOUTHCENTRALUS:20240119T153230Z:674ad3ac-2731-43de-a709-d179bcdbc08b', 'x-content-type-options': 'nosniff', 'date': 'Fri, 19 Jan 2024 15:32:29 GMT', 'x-envoy-upstream-service-time': '353'}
And the corresponding response body
{
"value": [
{
"limit": 2500,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "availabilitySets",
"localizedValue": "Availability Sets"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "cores",
"localizedValue": "Total Regional vCPUs"
}
},
{
"limit": 25000,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "virtualMachines",
"localizedValue": "Virtual Machines"
}
},
{
"limit": 2500,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "virtualMachineScaleSets",
"localizedValue": "Virtual Machine Scale Sets"
}
},
{
"limit": 3000,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "dedicatedVCpus",
"localizedValue": "Dedicated vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "lowPriorityCores",
"localizedValue": "Total Regional Low-priority vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "basicAFamily",
"localizedValue": "Basic A Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardA0_A7Family",
"localizedValue": "Standard A0-A7 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardA8_A11Family",
"localizedValue": "Standard A8-A11 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDFamily",
"localizedValue": "Standard D Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDv2Family",
"localizedValue": "Standard Dv2 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDSFamily",
"localizedValue": "Standard DS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDSv2Family",
"localizedValue": "Standard DSv2 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardGFamily",
"localizedValue": "Standard G Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardGSFamily",
"localizedValue": "Standard GS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardFFamily",
"localizedValue": "Standard F Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardFSFamily",
"localizedValue": "Standard FS Family vCPUs"
}
},
{
"limit": 200,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNVFamily",
"localizedValue": "Standard NV Family vCPUs"
}
},
{
"limit": 200,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNCFamily",
"localizedValue": "Standard NC Family vCPUs"
}
},
{
"limit": 32,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHFamily",
"localizedValue": "Standard H Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardAv2Family",
"localizedValue": "Standard Av2 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardLSFamily",
"localizedValue": "Standard LS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDv2PromoFamily",
"localizedValue": "Standard Dv2 Promo Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDSv2PromoFamily",
"localizedValue": "Standard DSv2 Promo Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMSFamily",
"localizedValue": "Standard MS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDv3Family",
"localizedValue": "Standard Dv3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDSv3Family",
"localizedValue": "Standard DSv3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEv3Family",
"localizedValue": "Standard Ev3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardESv3Family",
"localizedValue": "Standard ESv3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDv4Family",
"localizedValue": "Standard Dv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDDv4Family",
"localizedValue": "Standard DDv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDSv4Family",
"localizedValue": "Standard DSv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDDSv4Family",
"localizedValue": "Standard DDSv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEv4Family",
"localizedValue": "Standard Ev4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEDv4Family",
"localizedValue": "Standard EDv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardESv4Family",
"localizedValue": "Standard ESv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEDSv4Family",
"localizedValue": "Standard EDSv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardBSFamily",
"localizedValue": "Standard BS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardFSv2Family",
"localizedValue": "Standard FSv2 Family vCPUs"
}
},
{
"limit": 100,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNDSFamily",
"localizedValue": "Standard NDS Family vCPUs"
}
},
{
"limit": 100,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNCSv2Family",
"localizedValue": "Standard NCSv2 Family vCPUs"
}
},
{
"limit": 100,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNCSv3Family",
"localizedValue": "Standard NCSv3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardLSv2Family",
"localizedValue": "Standard LSv2 Family vCPUs"
}
},
{
"limit": 24,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardPBSFamily",
"localizedValue": "Standard PBS Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEIv3Family",
"localizedValue": "Standard EIv3 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEISv3Family",
"localizedValue": "Standard EISv3 Family vCPUs"
}
},
{
"limit": 8,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDCSFamily",
"localizedValue": "Standard DCS Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNVSv2Family",
"localizedValue": "Standard NVSv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMSv2Family",
"localizedValue": "Standard MSv2 Family vCPUs"
}
},
{
"limit": 240,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHBSFamily",
"localizedValue": "Standard HBS Family vCPUs"
}
},
{
"limit": 176,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHCSFamily",
"localizedValue": "Standard HCS Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNVSv3Family",
"localizedValue": "Standard NVSv3 Family vCPUs"
}
},
{
"limit": 200,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNVPromoFamily",
"localizedValue": "Standard NV Promo Family vCPUs"
}
},
{
"limit": 200,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNCPromoFamily",
"localizedValue": "Standard NC Promo Family vCPUs"
}
},
{
"limit": 32,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHPromoFamily",
"localizedValue": "Standard H Promo Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDAv4Family",
"localizedValue": "Standard DAv4 Family vCPUs"
}
},
{
"limit": 175,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDASv4Family",
"localizedValue": "Standard DASv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEAv4Family",
"localizedValue": "Standard EAv4 Family vCPUs"
}
},
{
"limit": 175,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEASv4Family",
"localizedValue": "Standard EASv4 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNDSv3Family",
"localizedValue": "Standard NDSv3 Family vCPUs"
}
},
{
"limit": 8,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardDCSv2Family",
"localizedValue": "Standard DCSv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNVSv4Family",
"localizedValue": "Standard NVSv4 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNDSv2Family",
"localizedValue": "Standard NDSv2 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardNPSFamily",
"localizedValue": "Standard NPS Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHBrsv2Family",
"localizedValue": "Standard HBrsv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "Standard NCASv3_T4 Family",
"localizedValue": "Standard NCASv3_T4 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "Standard NDASv4_A100 Family",
"localizedValue": "Standard NDASv4_A100 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardXEIDSv4Family",
"localizedValue": "Standard EIDSv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardXEISv4Family",
"localizedValue": "Standard XEISv4 Family vCPUs"
}
},
{
"limit": 350,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardEIASv4Family",
"localizedValue": "Standard EIASv4 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardHBv3Family",
"localizedValue": "Standard HBv3 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMDSMediumMemoryv2Family",
"localizedValue": "Standard MDSMediumMemoryv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMIDSMediumMemoryv2Family",
"localizedValue": "Standard MIDSMediumMemoryv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMSMediumMemoryv2Family",
"localizedValue": "Standard MSMediumMemoryv2 Family vCPUs"
}
},
{
"limit": 0,
"unit": "Count",
"currentValue": 0,
"name": {
"value": "standardMISMediumMemoryv2Family",
"localizedValue": "Standard MISMediumMemoryv2 Family vCPUs"
}
},
{
"limit": 350,
Hi @mattsp1290, response.text()
shall return string like:
However, the response body you pasted seems already formatted like:
Could you confirm whether you follow https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-1886242344 to get the response? And if convenient, please send the original file instead of paste it in comment.
@msyyc 👋 Hi, I'm a colleague of @mattsp1290's.
I'm hoping that you could take another look on the Azure side given the request IDs we've provided, or if those have aged out we could provide new ones. One particular question I'd love to have answered is whether the body length actually matches the content-length being reported in those headers at various points in your egress chain. I'd think we could get the length of the body without running into sensitivity issues around the exact context of the requests. Please let us know if you can verify that the request looks good within Azure infrastructure, if so then we can at least know that the problem is somewhere on our end.
Additionally, we were taking another look at the problem requests here and we noticed a pattern where each of the malformed requests seems to cut off at a breakpoint in the JSON grammar. The body never seems to get cut off in the middle of a word, but instead it's usually after a comma, an open bracket, a colon, etc. This strongly suggests to me that whatever is cutting off the request is JSON-aware and not simply treating the body as arbitrary bytes. In particular, I'd ask if your APIs are using streaming JSON encoders and if there may be an issue there. There are no JSON-aware proxies on our end that are likely to cause the type of issue we are seeing here.
To provide some more info here, even when filtering down to a particular subscription that is exhibiting this problem, we are only seeing malformed payloads on about ~1% of requests. This makes it harder to reproduce, but it also provides a clue as to the class of problem we may be dealing with (maybe a race condition or something?). We'll keep looking for correlations on our end and let you know if we discover anything else. Thanks!
From SDK side, it is very hard for us to find root cause if we can't reproduce it locally when there is no detailed logs. So we still hope users can provide the raw response which has right format( as https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-1903006559) with https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-1886242344. For service side, @uonie may provide more thoughts here.
@msyyc @uonie we have raw responses available now, but they do include customer subscription ids. is there a secure way we could share this information with you?
However the responses do contain formatting information. These are the raw responses we are getting back from calling e.response.text()
as described in https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-1903006559
Hi @mattsp1290 Please replace the secret ids with "0000" then share it. I think e-mail is more safe and you can find my e-mail address in my github profile
@msyyc an e-mail has been sent. Please feel free to follow up either via email or here.
@uonie I forward the e-mail to you. According to log provided by @mattsp1290, response received by SDK is incomplete. So I advise service team query the log of service side with correlation id (e-mail contains it) to check the response.
There is ICM for this issue: https://portal.microsofticm.com/imp/v5/incidents/details/495066284
To find the root cause, I think we need collaboration among users, SDK, ARM and could service. Each part is necessary.
for users @mattsp /SDK, please use the following
from azure.core.exceptions import DecodeError
...
try:
result = list(compute_client.usage.list(logging_enable=True))
except DecodeError as e:
with open("../safe_private_file.txt", "a+") as file:
file.write(f"error header: {e.response.headers}\n")
file.write(f"error content: {e.response.content}\n")
headers contain all the necessary http header info and response.content is the raw bytes from http.
After users/SDK get the headers and response.content, ARM / Cloud service could use header info (e.g. x-ms-request-id) of SDK find related response.
the key point is ARM / could service shall also provide response headers and raw response body content
, then SDK / ARM / could service compare the header info and response body content. Since the flow is cloud service => ARM => SDK, let's call response from cloud RA
, call ARM RB
and call SDK RC
.
There is ICM for this issue: https://portal.microsofticm.com/imp/v5/incidents/details/495066284
To find the root cause, I think we need collaboration among users, SDK, ARM and could service. Each part is necessary.
- for users @mattsp /SDK, please use the following
from azure.core.exceptions import DecodeError ... try: result = list(compute_client.usage.list(logging_enable=True)) except DecodeError as e: with open("../safe_private_file.txt", "a+") as file: f.write(f"error header: {e.response.headers}\n") f.write(f"error content: {e.response.body()}\n")
headers contain all the necessary http header info and response.content is the raw bytes from http.
- After users/SDK get the headers and response.content, ARM / Cloud service could use header info (e.g. x-ms-request-id) of SDK find related response.
- the key point is
ARM / could service shall also provide response headers and raw response body content
, then SDK / ARM / could service compare the header info and response body content. Since the flow is cloud service => ARM => SDK, let's call response from cloudRA
, call ARMRB
and call SDKRC
.
- if RA.headers == RB.headers and RA.content == RB.content
- if RC.headers == RB.headers and RC.content == RB.content As long as all the RA, RB and RC provided, we can find which part has issue, then it is possible to find the root cause.
I am unable to access the ICM.
We have provided this information multiple times throughout this case over the past year. Happy to do it once more. Is there a secure way to send this information over to Microsoft?
@mattsp1290 Thanks for understanding. And to avoid misunderstanding, I want to emphasize that please follow https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-2147863985, logging code of which is different with before
, to provide original log file so that we can get raw http content that is very important to find the root cause. Without the raw http content and headers, we can't move to next step.
@mattsp1290 Thanks for understanding. And to avoid misunderstanding, I want to emphasize that please
follow https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-2147863985, logging code of which is different with before
, to provide original log file so that we can get raw http content that is very important to find the root cause. Without the raw http content and headers, we can't move to next step.
Is there a secure way we can share the raw http content and headers with your team?
@mattsp1290 I ping you in e-mail and you can share the http content/headers in that e-mail thread.
Thanks @msyyc! Got some examples sent over
With https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-2147863985, I expect the response body format is like: However, the examples are still:
It shows that the example is not recorded as I ask in https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-2147863985.
@msyyc We record the raw bytes to a cache and then write the raw bytes to a file. We are not modifying between. As mentioned previously the responses do contain formatting information so they are getting written to a file with formatting. https://github.com/Azure/azure-rest-api-specs/issues/25455#issuecomment-2024116520
If it would be helpful we can automate removing all the formatting and send that version of the responses over.
I tries locally that the raw response body content of compute service really contain format info (e.g. `\r \n and space) like: compute_response_body_content_contain_format_info.txt
And I also tries other service (e.g. storage), their raw response body content doesn't contain format info: storage_response_body_content_contain_no_format_info.txt
Actually service response content SHOULD not contain format info since those format info are useless and takes extra net speed cost. Not sure whether it is root cause, but I think service team shall make investigation about it.
I've run into something very similar:
22:50:21 E azure.core.exceptions.DecodeError: JSON is invalid: Unterminated string starting at: line 1 column 15714 (char 15713)
22:50:21 E Content: {"value":[{"name":"scylla-5.4.0-dev-x86_64-2023-09-01T10-42-02","id":"/subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-5.4.0-dev-x86_64-2023-09-01T10-42-02","type":"Microsoft.Compute/images","location":"eastus","tags":{"arch":"x86_64","branch":"master","build_id":"402","build_mode":"release","build_tag":"jenkins-scylla-master-azure-image-402","operating_system":"ubuntu22.04","scylla_build_sha_id":"f44ecb6140c40cc466146304eb075258461d5dd7","scylla_jmx_version":"5.4.0~dev-0.20230901.3bdbe620aa1e-1","scylla_machine_image_version":"5.4.0~dev-20230901.a6f7f2e-1","scylla_python3_version":"5.4.0~dev-0.20230901.3bdbe620aa1e-1","scylla_tools_version":"5.4.0~dev-0.20230901.3bdbe620aa1e-1","scylla_version":"5.4.0~dev-0.20230901.3bdbe620aa1e-1","user_data_format_version":"3","kernel-version":"5.15.0-1045-azure"},"properties":{"sourceVirtualMachine":{"id":"/subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/virtualMachines/pkrvm7jicu9y1tn"},"storageProfile":{"osDisk":{"osType":"Linux","osState":"Generalized","diskSizeGB":30,"managedDisk":{"id":"/subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/disks/pkros7jicu9y1tn"},"caching":"ReadWrite","storageAccountType":"Standard_LRS"},"dataDisks":[],"zoneResilient":false},"provisioningState":"Succeeded","hyperVGeneration":"V2"}},{"name":"scylla-5.4.0-dev-x86_64-2023-09-04T14-04-50","id":"/subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-5.4.0-dev-x86_64-2023-09-04T14-04-50","type":"Microsoft.Compute/images","location":"eastus","tags":{"arch":"x86_64","branch":"master","build_id":"403","build_mode":"release","build_tag":"jenkins-scylla-master-azure-image-403","operating_system":"ubuntu22.04","scylla_build_sha_id":"4c60ca8c4afe862855242a77
22:50:21
22:50:21 /usr/local/lib/python3.10/site-packages/azure/core/pipeline/policies/_universal.py:615: DecodeError
using this set of python sdk versions:
azure-identity==1.6.1
azure-mgmt-compute==30.5.0
azure-mgmt-network==19.0.0
azure-mgmt-resource==20.0.0
azure-mgmt-subscription==1.0.0
azure-mgmt-resourcegraph==8.0.0
and i'm iterating over response from azure_service.compute.images.list_by_resource_group()
call
so maybe it's something off with the pagination ?
```
22:50:21
22:50:21 scylla_version = 'master:latest', region_name = 'eastus'
22:50:21 arch =
Describe the bug Calling
.usage.list()
on aComputeManagementClient
in the Python SDK will inconsistently result in a DecodeError. This method call will result in calls to the Compute Usage List API. We see this happening in less than 10% of calls for affected subscriptions.Originally opened against azure-sdk-for-python in #issue-31643
Sample tail of stack trace:
To Reproduce We have been unable to reliably reproduce the issue. However we do continually see a handful of these errors each hour in our logs. Attempts at cataloging the response from the API have confirmed the API does indeed cut off the response as seen in the example above.
Expected behavior The response from azure side should be properly decoded instead of a JSON error. This would appear to require the underlying API to return a complete response.