[FEATURE REQ] Improvement: batch requests to avoid throttling

weidongxu-microsoft commented 4 years ago

from databrick, throttling encountered.

weidongxu-microsoft commented 4 years ago

An unpublished batch API https://github.com/Azure/azure-sdk-for-python/issues/9271

weidongxu-microsoft commented 4 years ago

Sample batch client:

package com.microsoft.azure.azuresdktest;

import com.azure.core.annotation.BodyParam;
import com.azure.core.annotation.Host;
import com.azure.core.annotation.HostParam;
import com.azure.core.annotation.Post;
import com.azure.core.annotation.QueryParam;
import com.azure.core.annotation.ServiceInterface;
import com.azure.core.http.HttpPipeline;
import com.azure.core.http.rest.RestProxy;
import com.azure.core.management.AzureEnvironment;
import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
import reactor.core.publisher.Mono;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class BatchClient {

    private static final String endpoint = AzureEnvironment.AZURE.getResourceManagerEndpoint();
    private static final String apiVersion = "2015-11-01";

    private final BatchService client;

    public BatchClient(HttpPipeline httpPipeline) {
        client = RestProxy.create(BatchService.class, httpPipeline);
    }

    public Responses batch(Requests requests) {
        return client.batch(endpoint, requests, apiVersion).block().getValue();
    }

    @Host("{$host}")
    @ServiceInterface(name = "BatchService")
    private interface BatchService {
        @Post("batch")
        Mono<com.azure.core.http.rest.Response<Responses>> batch(@HostParam("$host") String endpoint,
                                                                 @BodyParam("application/json") Requests requests,
                                                                 @QueryParam("api-version") String apiVersion);
    }

    public static class Requests {
        private List<Request> requests = new ArrayList<>();

        public List<Request> getRequests() {
            return requests;
        }

        public void setRequests(List<Request> requests) {
            this.requests = requests;
        }
    }

    public static class Request {
        private final String url;
        private final String httpMethod;

        @JsonCreator
        public Request(@JsonProperty(value = "url", required = true) String url,
                       @JsonProperty(value = "httpMethod", required = true) String httpMethod) {
            this.url = url;
            this.httpMethod = httpMethod;
        }

        public String getUrl() {
            return url;
        }

        public String getHttpMethod() {
            return httpMethod;
        }
    }

    public static class Responses {
        private List<Response> responses;

        public List<Response> getResponses() {
            return responses;
        }
    }

    public static class Response {
        private int httpStatusCode;
        private Map<String, String> headers;
        private Object content;
        private long contentLength;

        public int getHttpStatusCode() {
            return httpStatusCode;
        }

        public void setHttpStatusCode(int httpStatusCode) {
            this.httpStatusCode = httpStatusCode;
        }

        public Map<String, String> getHeaders() {
            return headers;
        }

        public void setHeaders(Map<String, String> headers) {
            this.headers = headers;
        }

        public Object getContent() {
            return content;
        }

        public void setContent(Object content) {
            this.content = content;
        }

        public long getContentLength() {
            return contentLength;
        }

        public void setContentLength(long contentLength) {
            this.contentLength = contentLength;
        }
    }
}

weidongxu-microsoft commented 4 years ago

I guess I would need to investigate more.

Doing 3 single GET and 1 batch (with 3 GET inside), for 3 VM, alternatively, 10 loops. x-ms-ratelimit-remaining-subscription-reads in header (single GET is in response header, batch POST is in json response).

Result

single: 11999
single: 11998
single: 11997
single: 11996
single: 11995
single: 11994
batch: 11968
batch: 11998
batch: 11968
single: 11993
single: 11992
single: 11991
batch: 11967
batch: 11987
batch: 11990
single: 11990
single: 11989
single: 11988
batch: 11995
batch: 11966
batch: 11986
single: 11987
single: 11986
single: 11985
batch: 11975
batch: 11989
batch: 11999
single: 11984
single: 11983
single: 11982
batch: 11998
batch: 11997
batch: 11965
single: 11981
single: 11980
single: 11979
batch: 11998
batch: 11997
batch: 11974
single: 11978
single: 11977
single: 11976
batch: 11988
batch: 11991
batch: 11994
single: 11975
single: 11974
single: 11973
batch: 11997
batch: 11964
batch: 11996
single: 11972
single: 11971
single: 11970
batch: 11996
batch: 11987
batch: 11987
single: 11969
single: 11968
single: 11967
batch: 11985
batch: 11986
batch: 11990

single: 11966
single: 11965
single: 11964
single: 11963
single: 11962
single: 11961
batch: 11997
batch: 11984
batch: 11985
single: 11960
single: 11959
single: 11958
batch: 11989
batch: 11993
batch: 11999
single: 11957
single: 11956
single: 11955
batch: 11995
batch: 11995
batch: 11973
single: 11954
single: 11953
single: 11952
batch: 11986
batch: 11996
batch: 11995
single: 11951
single: 11950
single: 11949
batch: 11998
batch: 11994
batch: 11984
single: 11948
single: 11947
single: 11946
batch: 11972
batch: 11999
batch: 11945
single: 11944
single: 11943
single: 11942
batch: 11996
batch: 11971
batch: 11996
single: 11941
single: 11940
single: 11939
batch: 11996
batch: 11994
batch: 11970
single: 11938
single: 11937
single: 11936
batch: 11993
batch: 11999
batch: 11993
single: 11935
single: 11934
single: 11933
batch: 11992
batch: 11988
batch: 11983

Apparently, for single GET requests, the ratelimit is going down one-by-one, monotonously. However the ratelimit from batch POST is pretty chaotic. (and appears that the count in batch POST does not affect the count on single GET)

Guess I would need to check with ARM people for more details.

weidongxu-microsoft commented 4 years ago

It is much clearer on compute

{
   "responses":[
      {
         "httpStatusCode":202,
         "headers":{
            "Pragma":"no-cache",
            "Retry-After":"10",
            "Azure-AsyncOperation":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/488a9db2-60d1-4169-9355-42ef98084422?api-version=2019-12-01",
            "Azure-AsyncNotification":"Enabled",
            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;237,Microsoft.Compute/DeleteVM30Min;1197",
            "Strict-Transport-Security":"max-age=31536000; includeSubDomains",
            "x-ms-request-id":"488a9db2-60d1-4169-9355-42ef98084422",
            "Cache-Control":"no-cache",
            "Location":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/488a9db2-60d1-4169-9355-42ef98084422?monitor=true&api-version=2019-12-01",
            "Server":"Microsoft-HTTPAPI/2.0,Microsoft-HTTPAPI/2.0",
            "x-ms-ratelimit-remaining-subscription-deletes":"14999",
            "x-ms-correlation-request-id":"9c150f5f-f930-4854-8f1a-ecc666e84d17",
            "x-ms-routing-request-id":"SOUTHEASTASIA:20200812T044349Z:8c93ea7d-7fba-4fc7-9665-b4732aecbcb8",
            "X-Content-Type-Options":"nosniff",
            "Date":"Wed, 12 Aug 2020 04:43:48 GMT"
         },
         "contentLength":0
      },
      {
         "httpStatusCode":202,
         "headers":{
            "Pragma":"no-cache",
            "Retry-After":"10",
            "Azure-AsyncOperation":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/d9795c5e-d708-4302-83d6-1b35c4806677?api-version=2019-12-01",
            "Azure-AsyncNotification":"Enabled",
            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;239,Microsoft.Compute/DeleteVM30Min;1199",
            "Strict-Transport-Security":"max-age=31536000; includeSubDomains",
            "x-ms-request-id":"d9795c5e-d708-4302-83d6-1b35c4806677",
            "Cache-Control":"no-cache",
            "Location":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/d9795c5e-d708-4302-83d6-1b35c4806677?monitor=true&api-version=2019-12-01",
            "Server":"Microsoft-HTTPAPI/2.0,Microsoft-HTTPAPI/2.0",
            "x-ms-ratelimit-remaining-subscription-deletes":"14999",
            "x-ms-correlation-request-id":"9c150f5f-f930-4854-8f1a-ecc666e84d17",
            "x-ms-routing-request-id":"SOUTHEASTASIA:20200812T044348Z:78bbe286-5117-4bd8-948a-5d371f60ca4b",
            "X-Content-Type-Options":"nosniff",
            "Date":"Wed, 12 Aug 2020 04:43:47 GMT"
         },
         "contentLength":0
      },
      {
         "httpStatusCode":202,
         "headers":{
            "Pragma":"no-cache",
            "Retry-After":"10",
            "Azure-AsyncOperation":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/b57d104a-87a6-4576-9054-30371f9f9efd?api-version=2019-12-01",
            "Azure-AsyncNotification":"Enabled",
            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;238,Microsoft.Compute/DeleteVM30Min;1198",
            "Strict-Transport-Security":"max-age=31536000; includeSubDomains",
            "x-ms-request-id":"b57d104a-87a6-4576-9054-30371f9f9efd",
            "Cache-Control":"no-cache",
            "Location":"https://management.azure.com/subscriptions/ec0aa5f7-9e78-40c9-85cd-535c6305b380/providers/Microsoft.Compute/locations/westus2/operations/b57d104a-87a6-4576-9054-30371f9f9efd?monitor=true&api-version=2019-12-01",
            "Server":"Microsoft-HTTPAPI/2.0,Microsoft-HTTPAPI/2.0",
            "x-ms-ratelimit-remaining-subscription-deletes":"14999",
            "x-ms-correlation-request-id":"9c150f5f-f930-4854-8f1a-ecc666e84d17",
            "x-ms-routing-request-id":"SOUTHEASTASIA:20200812T044348Z:f27bbe47-27d5-4671-8b85-d506ab964199",
            "X-Content-Type-Options":"nosniff",
            "Date":"Wed, 12 Aug 2020 04:43:48 GMT"
         },
         "contentLength":0
      }
   ]
}

            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;237,Microsoft.Compute/DeleteVM30Min;1197",
            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;239,Microsoft.Compute/DeleteVM30Min;1199",
            "x-ms-ratelimit-remaining-resource":"Microsoft.Compute/DeleteVM3Min;238,Microsoft.Compute/DeleteVM30Min;1198",

One batch costs 3 DeleteVM quota.

weidongxu-microsoft commented 4 years ago

@ChenTanyi Please check ResourceManagerThrottlingPolicy in track2 first.

I think begin op in track2 would return full headers as Response interface (though polling will not). Since current problem is on PUT/DELETE, this would be enough for now. We could try to add an Util that parses response by checking "x-ms-ratelimit-remaining-resource" and "x-ms-ratelimit-remaining-subscription-{method}" header, if the number is low, allow user obtain the warning (maybe with some settable threshold) by some method.

Azure / azure-libraries-for-java

[FEATURE REQ] Improvement: batch requests to avoid throttling #1222