Closed j82w closed 4 years ago
Can you explain what "AutoUpgradeMaxThroughputIncrementPercentage" does? I haven't seen any feature described in Autopilot docs that matches that terminology.
@j82w - can you add a sample that shows how the conversion between autoscale and manual throughput would go?
I see there being the following options.
Database database = await this.cosmosClient.CreateDatabaseAsync(
"testDb",
ThroughputProperties.CreateFixedThroughput(5000));
ThroughputResponse autoscale = await database.ReadThroughputAsync(requestOptions: null); Assert.AreEqual(5000, autoscale.Resource.Throughput);
ThroughputResponse autoscaleReplaced = await database.ReplaceThroughputPropertiesAsync( ThroughputProperties.CreateAutoScaleProvionedThroughput(10000));
2. Explicit conversion with explicit APIs
```C#
Database database = await this.cosmosClient.CreateDatabaseAsync(
"testDb",
ThroughputProperties.CreateFixedThroughput(5000));
ThroughputResponse autoscale = await database.UpdateThroughputTypeToAutoscale();
ThroughputResponse autoscaleReplaced = await database.ReplaceThroughputPropertiesAsync(
ThroughputProperties.CreateAutoScaleProvionedThroughput(10000));
ThroughputResponse autoscale = await database.UpdateThroughputTypeToFixed();
Database database = await this.cosmosClient.CreateDatabaseAsync(
"testDb",
ThroughputProperties.CreateFixedThroughput(5000));
ThroughputResponse autoscale = await database.UpdateThroughputType(ThroughputType.Autoscale);
ThroughputResponse autoscaleReplaced = await database.ReplaceThroughputPropertiesAsync( ThroughputProperties.CreateAutoScaleProvionedThroughput(10000));
ThroughputResponse autoscale = await database.UpdateThroughputType(ThroughputType.Fixed);
I am in favor of an explicit conversion. When a user does the conversion from manual -> autoscale, the backend will set a max RU/s. The user can change that in a follow-up call, as in 2,3.
With 1, it's possible the user could try to set an invalid value. Is it possible that the conversion to autoscale succeeds, but setting the new value doesn't? Or is it atomic?
@jahmai I removed AutoUpgradeMaxThroughputIncrementPercentage
from the current design. It's new autoscale feature that is being discussed. The feature is meant to be for advance users, and what it does it allows CosmosDB to scale the MaxThroughput by a specified percentage so a database/container would never get stuck being throttled.
One other option is to overload Update/Replace with forceMigrade flag. It keeps the models to the data-contracts only and operations are still generic.
@kirankumarkolli The problem with that design is the backend does not allow the properties to be updated in the same call to do the conversion. This would require 2 networks calls and it get's kind of ugly if the conversion isn't instant. We would also have to handle the scenario where the conversion is a success, but the update value the user gave is not valid. Do we convert back to fixed, do we leave it in auto but with the wrong user value?
Two NW for migration is not bad.
Migration/scale-up/down are not guaranteed to be instant can be async. Even existing sync ones are kind of implementation details and are bound to change.
This will require the SDK do keep pulling until the conversion is complete and then do another call to update the value. So it could be a lot more than two NW calls.
@deborahc what if only the portal and CLI supported migration scenarios? Is there really a scenario where the user would do the conversion with the SDK? It's a one time operation. Doing a check or calling the migration every time their application start would be bad. The better user experience would be to do it through the portal or CLI.
Just for a small piece of user input on this discussion:
I can't imagine a scenario where we would migrate between manual <-> auto as a part of the standard operation of our application (as in response to some application condition). To my mind, Autopilot is there to avoid the need for realtime adjustments in throughput capacity.
I can see the need for occasional bulk changing of containers through in-house CLI tools, and in those cases, I think it's perfectly fine for the operation to be multiple calls or even polling for status for the migration to complete (initiate migration, then poll for completion, then adjust RUs). If it was a one-off conversion of a single, we would probably just use the portal.
To be perfectly selfish, the MVP for me is creating a container that is already configured for Autopilot and be able to retrieve the container properties that inform me that it is configured as an Autopilot container and what the RUs are (whether that's both min/max, or just min assuming that the range is well defined and static).
@deborahc am kind of inline with @jahmai.
CLI is better fit for migration and also more practical. Can migration be limited/started with just CLI?
The GA has 2 main features - 1) custom max RU/s and 2) ability to convert existing from manual -> autoscale.
It's a fair point it will likely be a one-time operation, mostly from manual -> autoscale. However, I think for completeness, it makes sense to support the migration scenario in the SDK. For developers who are already doing container/database management with SDK, it will be more convenient than switching to CLI and scripting. Otherwise, we have to say that feature 2) is available everywhere (Portal, CLI, PS, ARM) except for SDK.
I'm reaching out to a few customers who asked for the migration scenario to see if they would be blocked by it not in the SDK.
Converting an existing database/container to autoscale is conceptually a 2 step process - first you convert, then after it is successful, you change the max RU/s if you want (Portal also does it like this).
Is the worry with SDK support that the conversion operation will not be instant, and require a lot of polling? Based on how the system calculates the new max RU/s, I do not see a scenario where a scale-up/split is required - but I will add Padma or Rav to comment here.
+1 to Deborah's comment - there are customers still using the SDK for CI and container/database management.
But I also agree that CLI support for migration would be more important than SDK supporting it.
I woudl expect long-term more and more at least enterprise customers will move the management tasks towards control plane to be able to use AAD-integration and audit logs etc.
Is the worry with SDK support that the conversion operation will not be instant, and require a lot of polling? Based on how the system calculates the new max RU/s, I do not see a scenario where a scale-up/split is required - but I will add Padma or Rav to comment here.
There are some scenarios which may require split in some corner cases during migration from manual to autopilot.
My 2c: Currently we don't support any such billing impacting migrations via SDK - multi master to single master, free tier to non free tier. Liming the migration support to CLI/PS and portal sounds good to me.
As Padma said, yes there will be small subset of containers/databases that may encounter splits when migration will get triggered. Max RU/s can't be changed while splits are going on.
In that case - how about we go with supporting the create new scenario for now, and add the conversion based on feedback later?
@j82w
I tried this out using ThroughputProperties.CreateAutoscaleThroughput(4000)
against a new account/database/container in the Australia East region and it just created a Manual throughput container with 400 RUs. This is expected at this time? Is there some server update that needs to happen first?
@jahmai how are you checking if it's a manual throughput container? The portal changes are currently being deployed which is why we haven't released 3.9.0-preview yet. The emulator has a bug that is being fixed for reading the container throughput.
ThroughputProperties.Throughput which is the currently provisioned throughput. ThroughputProperties.MaxAutoscaleThroughput which is the maximum the container will scale to.
@j82w Using the portal after creating the container through code. Looking at the provisioned throughput in Data Explorer "Scale and Settings" tab it is set to Manual with 400 RUs. It doesn't allow me to select Autopilot (which I know Manual -> Autopilot is currently a missing feature). If I delete and re-create the container manually in the portal then Autopilot is selected as expected and I can change the max Autopilot max RU's as expected through the portal.
@jahmai the autoscale offer resource that the service is creating from preview sdk is slightly different from the offer that portal will create for autoscale container today for the same autoscale max throughput. Portal doesn't understand this difference yet and that's why it is not showing a way to change max throughput or switch back to manual(not because it thinks this container has been configured with manual throughput and there is no way to make this autoscale today) If you see a banner that asks you to refresh the browser cache under "scale and settings" tab in portal that's an indication that collection is autoscale enabled and was created using the preview sdk. As @j82w mentioned portal experience will get updated early this week.
@ravgill Ahhh ok makes sense :) We're just doing migration testing at the moment so data will likely get blown away in the next week or so before we do production testing so I'll probably wait for that update. Thanks!
@ravgill Any updated release date on the portal update? Containers created with the preview SDK still appearing as Manual in the portal (but the RU's automatically go up and down with load). Also, I have a container that didn't automatically switch from 4000 RU to 20,000 RU with Autopilot when I exceeded the 50GB mark. I have a case open with Azure support, but could this be related to the fact I created the container with the preview SDK?
@jahmai - thanks for trying this out! When you create the container with the preview SDK, the way the system handles the max RU/s in relation to storage is a bit different from the autopilot tiers model, which was a step function. Based on the max RU/s, the storage you can have before the max RU/s increase is max RU/s / 100. For example, if you start with a maximum RU/s of 50,000 RU/s (scales between 5000 - 50,000 RU/s), you can store up to 500 GB of data. If you exceed 500 GB - e.g. storage is now 600 GB, the new maximum RU/s will be 60,000 RU/s (scales between 6000 - 60,000 RU/s).
This will all be documented in an upcoming release, which will include the portal. Please stay tuned :)
@deborahc Ahhh, that actually sounds better than the tiering model. Very cool!
@deborahc @j82w Just wanted to provide some feedback from a vendor/developer perspective.
We currently have a number of clients where we have CosmosDB deployed to, and have for quite awhile provided custom application scaling of throughput to allow customers to control their cost. Given that our product is often deployed in a customers tenant and we often do not have portal access, our only option for pushing our infrastructure changes is through initialization code (via SDK) or via ARM/CLI scripts (via locked down Service Principal in Azure DevOps Release Pipeline).
It would be awesome to have some migration path that could be programatically performed so that we could roll these out to customers (not fussed if its ARM/SDK/CLI, we are used to using a combination of these tools quite often now!)
Migration will be supported in CLI by late June.
From: Wah Yuen notifications@github.com Sent: Wednesday, May 20, 2020 7:31 PM To: Azure/azure-cosmos-dotnet-v3 azure-cosmos-dotnet-v3@noreply.github.com Cc: PadmaPriya Aradhyula Bhavani padmaa@microsoft.com; Comment comment@noreply.github.com Subject: Re: [Azure/azure-cosmos-dotnet-v3] Autoscale throughput support (#1365)
@deborahchttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdeborahc&data=02%7C01%7Cpadmaa%40microsoft.com%7C1a7b15d68e0f41a9da0e08d7fd2f0383%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637256250667309828&sdata=kZgPuFofEkIiNG1%2FaotTqMf%2BogpW4BgKo4G7RHYIjgU%3D&reserved=0 @j82whttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fj82w&data=02%7C01%7Cpadmaa%40microsoft.com%7C1a7b15d68e0f41a9da0e08d7fd2f0383%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637256250667319785&sdata=UxURskLmSG%2F5uJ3kGCiq4QXh7y6QHHWYewmEci%2BofBU%3D&reserved=0 Just wanted to provide some feedback from a vendor/developer perspective.
We currently have a number of clients where we have CosmosDB deployed to, and have for quite awhile provided custom application scaling of throughput to allow customers to control their cost. Given that our product is often deployed in a customers tenant and we often do not have portal access, our only option for pushing our infrastructure changes is through initialization code (via SDK) or via ARM/CLI scripts (via locked down Service Principal in Azure DevOps Release Pipeline).
It would be awesome to have some migration path that could be programatically performed so that we could roll these out to customers (not fussed if its ARM/SDK/CLI, we are used to using a combination of these tools quite often now!)
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-cosmos-dotnet-v3%2Fissues%2F1365%23issuecomment-631841543&data=02%7C01%7Cpadmaa%40microsoft.com%7C1a7b15d68e0f41a9da0e08d7fd2f0383%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637256250667319785&sdata=S4p4yOs49MQCkPx0c17dMplRUTGs8vpfQcmJ%2FCHlb9g%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAUR3HOIZUHWEP5Q2DR3J2LRSSG6TANCNFSM4MFT6OBQ&data=02%7C01%7Cpadmaa%40microsoft.com%7C1a7b15d68e0f41a9da0e08d7fd2f0383%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637256250667329740&sdata=8%2BQvr3H9mjBr%2FbGj1FjUSTWdsPptOt%2FxZD7fyN4EefQ%3D&reserved=0.
The current design of autoscale throughput for the .NET v3 SDK.
Design: