Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.35k stars 1.99k forks source link

[FEATURE REQ] Enable the Java SDK to return a cumulative RU for operations with retries #32694

Open frmedjo opened 1 year ago

frmedjo commented 1 year ago

Hello Team, I'm a CosmosDB support engineer and I'm submitting this issue on behalf of a customer. See details below:

Is your feature request related to a problem? Please describe. The Java SDK appears to not return the cumulative RU charge when an operation has retries. The SDK returns the RU usage of the first retry only (0.38 RU) no matter how many retries occur.

Describe the solution you'd like The getRequestCharge() method should return a cumulative RU usage when there are retries

Describe alternatives you've considered None

Additional context The getRequestCharge() method returns the RU usage of the first retry which is 0.38 RU instead of a cumulative RU usage

To Reproduce

  1. Send a request (read or insert) that will have retries due to throttling as an example
  2. Attempt to retrieve the request charge of that operation using getRequestCharge() method

Environment summary SDK Version: Java JDK version: 4.39.0 OS Version (e.g. Windows, Linux, MacOSX): Windows

Code snippet

catch (CosmosException e) { double charge = e.getRequestCharge(); cost = cost + charge; if (charge != 0.0) { System.out.println("Charge " + charge + "--" + cost + "--" + e.getDiagnostics()); } }

Result return (I had to trim the entire diagnostic string because it was too long for me to submit the issue)

Charge 0.38 RU -- cost -- {"ClassName":"RequestRateTooLargeException","userAgent":"azsdk-java-cosmos/4.39.0 MacOSX/10.16 JRE/11.0.9.1","statusCode":429,"resourceAddress":"rntbd://cdb-ms-prod-northcentralus1-fd2.documents.azure.com:14366/apps/2c855c97-f894-4f78-b82e-7d2da642b013/services/025eb31b-5470-4b84-ae21-f53373e29c3c/partitions/1fb867a2-79f1-4f3a-ac36-9b826240eb09/replicas/133084250031423993p/","error":"{"Errors":"Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: [http://aka.ms/cosmosdb-error-429"]}","innerErrorMessage":"["Request](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2Fhttps%3A%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp3A2F2Faka.ms2Fcosmosdb-error-4292F2522255d257d25222C2522innerErrorMessage25223A2522255b2F2522Request%26data%3D057C017CAjay.Dasari40microsoft.com7C5e0ce629c5704d56dcf508dae38f49007C72f988bf86f141af91ab2d7cd011db477C17C07C6380725018591610757CUnknown7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn03D7C30007C7C7C%26sdata%3DNVegbCzvpuMxmeoGdCLCUQJ21pl8hW96h4yAWsDzMr0*3D%26reserved%3D0%3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!GjvTz_vk!Ro60prP7hILOIkuOK4EkjIw2n5UAf9RhtOQk4x9QXP0kbu7CjEhjO2SMi87h0JCY0TahBQ0wx0wY0TWdCH4-tOw%24&data=05%7C01%7CFrank.Medjo%40microsoft.com%7C23ec0823d3804ac8f4ce08dae429689d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638073163836708516%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C6AiRzeIEmnMnX9phyh%2FE2g4CB2Y9pcgL8wr%2B2Jxtvo%3D&reserved=0) rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: [http://aka.ms/cosmosdb-error-429"]","causeInfo":null,"responseHeaders":"{x-ms-current-replica-set-size=4, lsn=200727, x-ms-request-charge=0.38, x-ms-schemaversion=1.15, x-ms-transport-request-id=11065, x-ms-number-of-read-regions=2, x-ms-current-write-quorum=3, x-ms-cosmos-quorum-acked-llsn=200727, x-ms-quorum-acked-lsn=200727, x-ms-activity-id=c5a590fa-8167-11ed-8465-6f1363c1918e, x-ms-xp-role=1, x-ms-global-Committed-lsn=200726, x-ms-retry-after-ms=500, x-ms-cosmos-llsn=200727, x-ms-serviceversion= version=2.14.0.0, x-ms-substatus=3200}","cosmosDiagnostics":{"userAgent":"azsdk-java-cosmos/4.39.0 MacOSX/10.16 JRE/11.0.9.1","activityId":"c5a590fa-8167-11ed-8465-6f1363c1918e","requestLatencyInMs":3496,"requestStartTimeUTC":"2022-12-21T19:43:38.423821Z","requestEndTimeUTC":"2022-12-21T19:43:41.920282Z","responseStatisticsList":[{"storeResult":{"storePhysicalAddress":"rntbd://cdb-ms-prod-northcentralus1-fd2.documents.azure.com:14366/apps/2c855c97-f894-4f78-b82e-7d2da642b013/services/025eb31b-5470-4b84-ae21-f53373e29c3c/partitions/1fb867a2-79f1-4f3a-ac36-9b826240eb09/replicas/133084250031423993p/","lsn":200518,"globalCommittedLsn":200516,"partitionKeyRangeId":null,"isValid":true,"statusCode":429,"subStatusCode":3200,"isGone":false,"isNotFound":false,"isInvalidPartition":false,"isThroughputControlRequestRateTooLarge":false,"requestCharge":0.38,"itemLSN":-1,"sessionToken":null,"backendLatencyInMs":null,"exceptionMessage":""Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: [http://aka.ms/cosmosdb-error-429"]","exceptionResponseHeaders":"{x-ms-current-replica-set-size=4, lsn=200518, x-ms-request-charge=0.38, x-ms-schemaversion=1.15, x-ms-transport-request-id=9790, x-ms-number-of-read-regions=2, x-ms-current-write-quorum=3, x-ms-cosmos-quorum-acked-llsn=200518, x-ms-quorum-acked-lsn=200518, x-ms-activity-id=c3c1dca2-8167-11ed-8465-6f1363c1918e, x-ms-xp-role=1, x-ms-global-Committed-lsn=200516, x-ms-retry-after-ms=320, x-ms-cosmos-llsn=200518, x-ms-serviceversion= version=2.14.0.0, x-ms-substatus=3200}","replicaStatusList":["14366:Connected"],"transportRequestTimeline":[{"eventName":"created","startTimeUTC":"2022-12-21T19:43:38.424075Z","durationInMilliSecs":0.003},{"eventName":"queued","startTimeUTC":"2022-12-21T19:43:38.424078Z","durationInMilliSecs":0.0},{"eventName":"channelAcquisitionStarted","startTimeUTC":"2022-12-21T19:43:38.424078Z","durationInMilliSecs":0.177},{"eventName":"pipelined","startTimeUTC":"2022-12-21T19:43:38.424255Z","durationInMilliSecs":0.118},{"eventName":"transitTime","startTimeUTC":"2022-12-21T19:43:38.424373Z","durationInMilliSecs":60.776},{"eventName":"decodeTime","startTimeUTC":"2022-12-21T19:43:38.485149Z","durationInMilliSecs":0.009},{"eventName":"received","startTimeUTC":"2022-12-21T19:43:38.485158Z","durationInMilliSecs":0.106},{"eventName":"completed","startTimeUTC":"2022-12-21T19:43:38.485264Z","durationInMilliSecs":0.014}],"rntbdRequestLengthInBytes":630,"rntbdResponseLengthInBytes":329,"requestPayloadLengthInBytes":195,"responsePayloadLengthInBytes":329,"channelTaskQueueSize":1,"pendingRequestsCount":1,"serviceEndpointStatistics":{"availableChannels":9,"acquiredChannels":0,"executorTaskQueueSize":1,"inflightRequests":19,"lastSuccessfulRequestTime":"2022-12-21T19:43:38.411Z","lastRequestTime":"2022-12-21T19:43:38.423Z","createdTime":"2022-12-21T19:43:06.325062Z","isClosed":false,"cerMetrics":{}}},"requestResponseTimeUTC":"2022-12-21T19:43:38.485288Z","requestStartTimeUTC":"2022-12-21T19:43:38.423821Z","requestResourceType":"Document","requestOperationType":"Upsert","requestSessionToken":null},{"storeResult":{"storePhysicalAddress":"rntbd://cdb-ms-prod-northcentralus1-fd2.documents.azure.com:14366/apps/2c855c97-f894-4f78-b82e-7d2da642b013/services/025eb31b-5470-4b84-ae21-f53373e29c3c/partitions/1fb867a2-79f1-4f3a-ac36-9b826240eb09/replicas/133084250031423993p/","lsn":200545,"globalCommittedLsn":200541,"partitionKeyRangeId":"0","isValid":true,"statusCode":449,"subStatusCode":0,"isGone":false,"isNotFound":false,"isInvalidPartition":false,"isThroughputControlRequestRateTooLarge":false,"requestCharge":1.24,"itemLSN":-1,"sessionToken":"0:3#200545#1=-1#7=-1","backendLatencyInMs":0.77,"exceptionMessage":"["Conflicting request to resource has been attempted. Retry to avoid conflicts."]","exceptionResponseHeaders":"{x-ms-current-replica-set-size=4, x-ms-last-state-change-utc=Mon, 19 Dec 2022 06:02:56.888 GMT, x-ms-request-duration-ms=0.77, x-ms-session-token=0:3#200545#1=-1#7=-1, lsn=200545, x-ms-request-charge=1.24, x-ms-schemaversion=1.15, x-ms-transport-request-id=9890, x-ms-number-of-read-regions=2, x-ms-current-write-quorum=3, x-ms-cosmos-quorum-acked-llsn=200545, x-ms-quorum-acked-lsn=200545, x-ms-activity-id=c3fc26c0-8167-11ed-8465-6f1363c1918e, x-ms-xp-role=1, x-ms-global-Committed-lsn=200541, x-ms-cosmos-llsn=200545, x-ms-documentdb-partitionkeyrangeid=0, x-ms-serviceversion= version=2.14.0.0}","replicaStatusList":["14366:Connected"],"transportRequestTimeline":[{"eventName":"created","startTimeUTC":"2022-12-21T19:43:38.806053Z","durationInMilliSecs":0.002},{"eventName":"queued","startTimeUTC":"2022-12-21T19:43:38.806055Z","durationInMilliSecs":0.0},{"eventName":"channelAcquisitionStarted","startTimeUTC":"2022-12-21T19:43:38.806055Z","durationInMilliSecs":0.081},{"eventName":"pipelined","startTimeUTC":"2022-12-21T19:43:38.806136Z","durationInMilliSecs":0.117},{"eventName":"transitTime","startTimeUTC":"2022-12-21T19:43:38.806253Z","durationInMilliSecs":50.17},{"eventName":"decodeTime","startTimeUTC":"2022-12-21T19:43:38.856423Z","durationInMilliSecs":0.014},{"eventName":"received","startTimeUTC":"2022-12-21T19:43:38.856437Z","durationInMilliSecs":0.098},{"eventName":"completed","startTimeUTC":"2022-12-21T19:43:38.856535Z","durationInMilliSecs":0.015}],"rntbdRequestLengthInBytes":630,"rntbdResponseLengthInBytes":310,"requestPayloadLengthInBytes":195,"responsePayloadLengthInBytes":310,"channelTaskQueueSize":1,"pendingRequestsCount":1,"serviceEndpointStatistics":

kushagraThapar commented 1 year ago

@TheovanKraay - please take a look at this, thanks!

ghost commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar, @TheovanKraay

TheovanKraay commented 1 year ago

Hi @frmedjo - I am not sure I understand the ask. There is no RU charge for a request that was throttled (indeed, that is why it has been throttled - because there are no RUs available to service the request at that time, so there will not be any charge to the customer). As such, the value of having "cumulative charge" in this case does not seem to make sense - in case of throttling (or any other retriable requests), the only request that will have incurred an RU charge at the backend is the first successful retry. Can you please expand a bit on why customer would require this? This may be a proxy for another question, e.g. perhaps they are uncertain on exactly why they are being throttled, etc.

myurchen commented 1 year ago

Hi @TheovanKraay, Thank you for your answer. Your explanation makes sense. Wanted to double check something here. In diagnostics there are several occurrences where backendLatencyInMs is not null. Does that mean the server was reached? If so - does that mean charge will be applied for customer? Here is one example: requestCharge":1.24,"itemLSN":-1,"sessionToken":"0:3#198472#1=-1#7=-1","backendLatencyInMs":0.881

kushagraThapar commented 1 year ago

@rinatmini can you please take a look at this, thanks!

TheovanKraay commented 1 year ago

@myurchen same explanation applies to your last question as applied to RUs. In case of 429 error, there is a value for backendLatencyInMs or requestCharge in diagnostics it can be ignored, as customer will not have been charged (that is, RU quota will not have been consumed in the period of time it took to execute the retry). That said, these non-zero values may cause confudsion, so we will investigate tidying this up in upcoming semester work.

kushagraThapar commented 1 year ago

@jeet1995 please take a look at this issue, thanks!