Azure / azure-sdk-for-go

This repository is for active development of the Azure SDK for Go. For consumers of the SDK we recommend visiting our public developer docs at:
https://docs.microsoft.com/azure/developer/go/
MIT License
1.64k stars 844 forks source link

Compute: ResourceSkus_List returns 30K+ results in one API call #20470

Open erezrokah opened 1 year ago

erezrokah commented 1 year ago

Hello πŸ‘‹

This is an investigation request, as I'm not sure this falls under a bug or a feature.

Problem

I'm an engineer on the CloudQuery team, and we've been getting reports on high memory usage when using the Azure Go SDK github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4 module, specifically the ResourceSKUsClient client.

What I think I know

After doing some investigation and profiling, I discovered the following:

  1. That specific client, while having a paginating API, returns all items in a single response (seems like the REST API works like that so not specific to the Go SDK). It's common to have around 30,000 items returned by that API, hence a spike in memory usage when retrieving it.

  2. The Azure Go SDK recursively implements a custom UnmarshalJSON method for each struct that's a part of an API client response. See example: https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models_serde.go#L10024 https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models_serde.go#L9711

The custom UnmarshalJSON first converts each object into a map of string->byte[] , and then iterates over the map keys and unmarshalls each key value pair into the relevant type.

I think that since each API struct already has JSON tags, for example: https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models.go#L5145 having those custom UnmarshalJSON is not needed and creates additional redundant allocations. It seems the built in Go JSON decoder knows how to unmarshall without allocating additional memory.

I tried commenting out the UnmarshalJSON methods for the SKUs structs and it seems to improve memory and CPU (since the GC is doing less work) usage quite a bit (I can share before/after pprof images if needed).

What I'm requesting

  1. Feedback on this issue
  2. Explanation on the need to have both custom UnmarshalJSON and JSON tags
  3. If everything makes sense, removal of the custom UnmarshalJSON methods. I can even try to submit a PR for it πŸ˜…

Other context

The code that creates the custom UnmarshalJSON is in another repository, see here https://github.com/Azure/autorest.go/blob/db51bb93bef7af3b3a96e0e0550533a50fe5d29a/src/generator/models.ts#L289

ghost commented 1 year ago

Hi @erezrokah. Thank you for your feedback and we will look into it soon. Meanwhile, feel free to share your experience using the Azure SDK in this survey.

jhendrixMSFT commented 1 year ago

Thanks for reporting this. We've been aware of perf problems with the current implementation, see also https://github.com/Azure/azure-sdk-for-go/issues/19356.

The API returning 30K+ results in one API call, even though the API is marked as paginated, appears to be a bug. We're following up internally with the compute folks to get clarity on the behavior.

You are correct that the JSON tags are unused. Removing them has been on the TODO list, maybe it's time to clean that up to remove any confusion.

We generate custom marshallers/unmarshallers to handle data not supported by the standard library including RFC1128 time formats and polymorphic values just to name a few. Granted, not all data structures contain such values. However, we realized early on that this might change over time, and in order to avoid breaking changes, we decided to always emit them (there is some debate about whether or not removing a custom JSON marshaller/unmarshaller constitutes a breaking change).

erezrokah commented 1 year ago

Thanks for the quick response @jhendrixMSFT πŸš€

Glad to know you folks are on top on things. How about we keep this issue open until the SKUs bug is fixed, and then the memory/performance issue can be covered by https://github.com/Azure/azure-sdk-for-go/issues/19356?

jhendrixMSFT commented 1 year ago

Sounds great. I'll reply when I have an update on the 30K+ results.

ghost commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Drewm3, @TravisCragg-MSFT, @nikhilpatel909, @sandeepraichura, @hilaryw29, @GabstaMSFT, @ramankumarlive, @ushnaarshadkhan.

Issue Details
Hello πŸ‘‹ This is an investigation request, as I'm not sure this falls under a bug or a feature. ## Problem I'm an engineer on the [CloudQuery](https://www.cloudquery.io/) team, and we've been getting reports on high memory usage when using the Azure Go SDK `github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4` module, specifically the [`ResourceSKUsClient` client](https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/resourceskus_client.go#L39). ## What I think I know After doing some investigation and profiling, I discovered the following: 1. That specific client, while having a [paginating API](https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/resourceskus_client.go#L64), returns all items in a single response (seems like the REST API works like that so not specific to the Go SDK). It's common to have around 30,000 items returned by that API, hence a spike in memory usage when retrieving it. 2. The Azure Go SDK recursively implements a custom `UnmarshalJSON` method for each struct that's a part of an API client response. See example: https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models_serde.go#L10024 https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models_serde.go#L9711 The custom `UnmarshalJSON` first converts each object into a map of `string->byte[]` , and then iterates over the map keys and unmarshalls each key value pair into the relevant type. I think that since each API struct already has JSON tags, for example: https://github.com/Azure/azure-sdk-for-go/blob/8ba0f8054d505231ca97a35484c6e1fe371e6787/sdk/resourcemanager/compute/armcompute/models.go#L5145 having those custom `UnmarshalJSON` is not needed and creates additional redundant allocations. It seems the built in Go JSON decoder knows how to unmarshall without allocating additional memory. I tried commenting out the `UnmarshalJSON` methods for the SKUs structs and it seems to improve memory and CPU (since the GC is doing less work) usage quite a bit (I can share before/after `pprof` images if needed). ## What I'm requesting 1. Feedback on this issue 2. Explanation on the need to have both custom `UnmarshalJSON` and JSON tags 3. If everything makes sense, removal of the custom `UnmarshalJSON` methods. I can even try to submit a PR for it πŸ˜… ## Other context The code that creates the custom `UnmarshalJSON` is in another repository, see here https://github.com/Azure/autorest.go/blob/db51bb93bef7af3b3a96e0e0550533a50fe5d29a/src/generator/models.ts#L289
Author: erezrokah
Assignees: jhendrixMSFT
Labels: `Compute`, `Service Attention`, `Mgmt`, `customer-reported`, `Service`
Milestone: -
jhendrixMSFT commented 1 year ago

For CRP folks, the operation in question is this one: https://github.com/Azure/azure-rest-api-specs/blob/main/specification/compute/resource-manager/Microsoft.Compute/Skus/stable/2021-07-01/skus.json#L43

Seeing as how the operation is marked as pageable, shouldn't the result set be spread out over multiple pages?

github-actions[bot] commented 10 months ago

Hi @erezrokah. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

erezrokah commented 10 months ago

Hi @erezrokah. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

Dear bot, I'm not sure which information is missing. To my understanding this issue awaits a response from the CRP folks

tadelesh commented 10 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Drewm3, @TravisCragg-MSFT, @nikhilpatel909, @sandeepraichura, @hilaryw29, @GabstaMSFT, @ramankumarlive, @ushnaarshadkhan.

Issue Details

kindly ping again.

erezrokah commented 10 months ago

kindly ping again.

Another ping

TravisCragg-MSFT commented 3 months ago

@erezrokah We are aware of the shortcomings of this API and are currently in the process of revamping it. We will be allowing KQL queries to be submitted with the request to help return only the data needed. We expect this to become fully available sometime next year.

erezrokah commented 1 month ago

Hi πŸ‘‹ So the original issue I reported was about memory issues due to how the Azure SDK serializes and de-serializes JSON. I renamed this one, since the memory issues were supposed to be covered by https://github.com/Azure/azure-sdk-for-go/issues/19356 which is now closed as stale and locked for comments.

What should we do about the JSON marshallings memory issues? Open a new issue?

TravisCragg-MSFT commented 1 month ago

@jhendrixMSFT Can you weigh in on the current status for JSON marshallings memory issues? Would it be best to open a new issue or use this one to track it going forward?

jhendrixMSFT commented 1 month ago

Thanks all for bringing this to my attention. I've reopened the issue and labeled it so the bot won't close it again.