Clarify ctor/Creation/Verification lifecycle

bartelink commented 2 years ago

Replaces TableContext.Create and associated APIs with a clearer set of APIs that facilitate better separation of what I see as multiple phases of processing, depending on how one deploys one's app (informed by how Equinox's tooling and conventions separates this in larger applications).

The split is:

Being able to create a TableContext with the assumption that the underlying table and requisite indexes are in place (without any external calls), via the constructor
VerifyTableAsync to check that the tableName specified by the constructor is in place and has the correct indices in place (Scripting.Initialize combines this step with the constructor for scripting scenarios). Does not create a table; throws if there is a problem. Should ideally only be called once per process startup.
VerifyOrCreateTableAsync - as per VerifyTableAsync but will create the table if it happens not to be present. For small single EXE apps, or test scenarios, calling this once on startup may make sense. Can also be used from separated configuration logic; useful for any application scenario where the creation of tables is something that is managed via a configuration phase of a deployment strategy (DDL vs DML separation in SQL DB tech parlance)
UpdateTableIfRequiredAsync - can apply adjustments to Provisioning or Streaming configuration via the UpdateTable API if the table's configuration is not as specified. Used to implement Equinox's CLI tooling wrt establishing and/or changing throughput an/or streaming configuration.

resolves #35

bartelink commented 2 years ago

see also related pre-discussion thread with @samritchie

bartelink commented 2 years ago

@samritchie I ended up havig to put an ugly guard into https://github.com/fsprojects/FSharp.AWS.DynamoDB/blob/6423028fbbe149d8cec981d074f98893614e308b/src/FSharp.AWS.DynamoDB/TableContext.fs#L947-L951 If you search for CreateOrUpdateThroughput in that file, you can see its relatively low impact, but I definitely don't like the the coupling that guard adds. Pending a commit now to remove it (checking if it makes sense to expose the Describe API, but I guess that's pretty debatable too...)

bartelink commented 2 years ago

@samritchie Refactored to put my expose my semantics in TableContext.ProvisionTableAsync without exposing that DU I had.

The choices now from my perspective are:

give FSharp.AWS.DynamoDb the burden of maintaining ProvisionTableAsync's checking of the TableDescription (arguably that makes sense as it's kinda the inverse of what it owns via InternalCreateTableRequest)
make InternalCreateOrValidateTableAsync public
provide an even more general solution by exposing InternalProvision

I'm happy to do any more cleanup you think makes sense, as long as I wind up with some code (that does not involve that cut and pasting the retry loop!) to to pop into my provisioning impl in https://github.com/jet/equinox/blob/acccec32fabb27ec03b731a9a1c340fb341e3ae7/src/Equinox.DynamoStore/DynamoStore.fs#L472-L473 (That will ultimately be exposed via something like dotnet tool install Equinox.Tool && eqx init dynamo -rru 10 -wru 100 -s https://localhost:8000 -t my-table-name)

bartelink commented 2 years ago

@samritchie I worked in support for OnDemand mode - the throughput argument for the new Initialize/Provision APIs is now a DU Let me know if you want me to shift that (or any other aspect) into a separate PR (The question of whether provisioning logic belongs in here remains - InternalCreateCreateTableRequest and ProvisionAsync are where the mapping is managed now)

bartelink commented 2 years ago

@samritchie added support for configuring Streaming mode to manage an Equinox feature. High level:

eqx initAws -rru 10 -wru 20 dynamo creates or applies streaming mode to the table
eqx initAws -D dynamo changes the table to PAY_PER_REQUEST (and updates the streaming mode if it has drifted)
(re-issuing initial command switches it back)

In each of the above cases, you want to achieve a desired state, which means you need to

inspect the current state
only do an UpdateTable request if one or both of the settings has changed (it'll throw otherwise)
wait for the correct state to be established, i.e. that retry loop
report exceptions (if you exceed the maximum changes within a time window you want the operator to know via eqx returning a non-zero exit code)

The exact way in which the Throughput and Streaming DUs are exposed should can be changed and/or reconsidered, but I think the high level flow of having CreateTableAsync / VerifyTableAsync + UpdateTableAsync + ProvisionTableAsync seems to make sense regardless of whether both Throughput and Streaming belong in the box as such.

bartelink commented 2 years ago

@samritchie I've topped and tailed this and can't think of anything else significant I need in the context of https://github.com/jet/equinox/pull/321 so am happy to clean up as you see fit now

samritchie commented 2 years ago

@bartelink I'll make some time tomorrow to pull this branch and have a good run through. One of the main things I'm trying to work out is GSI provisioning - currently GSIs are set to the table provisioning levels on Create, but not modified on Update (I don’t think you’ve changed that). GSIs will often have the same write capacity and a different read capacity to the table. This is going to be a really low usage feature though, maybe it’s enough to offer the customize callback and people can modify as they see fit.

bartelink commented 2 years ago

Thanks @samritchie I've just completed local testing now and it's happy (I had a flipped condition in the td.Table.TableStatus <> TableStatus.ACTIVE bit 🤦‍♂️)

I have no immediate plans to use GSIs so definitely can't help beyond agreeing that yes, having it provisioned as per the actual table smells. One could add a case to

type Throughput =
    | Provisioned of ProvisionedThroughput
    | OnDemand

If you actually had a plan ;)

samritchie commented 2 years ago

@bartelink I had a bit of a go at the new initialisation APIs including some testing on a real account. Some of the notable behaviour I found with cloud Dynamo was:

Attempting to update provisioning on an existing table with the same values as currently defined on that table will throw an Amazon.DynamoDBv2.AmazonDynamoDBException.
Attempting to update provisioning on an existing table while a provisioning update is already underway will throw a Amazon.DynamoDBv2.Model.ResourceInUseException (note changing capacity on a large table can be very time-consuming, applying OnDemand also seems to be a lengthy op).
Decreasing provisioning levels is rate-limited (once per hour after the first 4 in a day) and will throw a Amazon.DynamoDBv2.Model.LimitExceededException if exceeded.
There are a few other rules that will trigger Amazon.DynamoDBv2.Model.LimitExceededException - exceeding max capacity, switching to OnDemand appears to be rate-limited to once per day etc.

The upshot is I'm not sure how useful/usable it is to update provisioning on an existing table as part of table initialisation - ie ProvisionTableAsync. It’s highly likely that this method will throw (especially in a concurrent environment), but it’s also likely the provisioning exception is not critical and could be caught and logged/ignored - as opposed to a create/verify exception which will normally be fatal. IMO it’s going to be better to leave this decision to the consumer via an InitializeTableAsync/try UpdateTableAsync pair. I think it’s also worth adding some doc comments to UpdateTableAsync re the exceptions.

Initialising a context via constructor then VerifyAsync/InitialiseAsync definitely feels more painful to use from code (for people currently using those options) - I guess the gamble is the majority of people also found it as useless as I have in a real-world application. Hopefully the comments on the obsolete Create make the trade-offs clear.

I know I complained about ambiguous use of the term 'Create' but I'm wondering if CreateTableAsync may be clearer regarding its behaviour than InitializeTableAsync (particularly as it’s no longer a factory)? I suspect people would be fine with the semantics of it being 'create if needed'. Happy to hear your opinions there.

bartelink commented 2 years ago

Attempting to update provisioning on an existing table while a provisioning update is already underway will throw a Amazon.DynamoDBv2.Model.ResourceInUseException (note changing capacity on a large table can be very time-consuming, applying OnDemand also seems to be a lengthy op). Decreasing provisioning levels is rate-limited (once per hour after the first 4 in a day) and will throw a Amazon.DynamoDBv2.Model.LimitExceededException if exceeded.

Those exceptions and/or wait periods are also the case with Cosmos - the eqx tool surfaces them by printing the messages (and setting an exit code). I don't think adding a delay loop would be useful for that context, but it's still useful as-is IMO. Anyone building a UI experience will want the exception and/or can build their own (having said that, atm each of Initialize, Verify and Provision will wait for an unbounded time for the Describe retry loop to return to Active state, but I guess we'd know if that was a problem).

I guess I'll add one liners to the xmldoc mentioning them though

Attempting to update provisioning on an existing table with the same values as currently defined on that table will throw an Amazon.DynamoDBv2.AmazonDynamoDBException.

Yes, had noticed that, and this is one of the reasons for the Provision API design - it checks if its already correct and does not do it if it is (but see the BillingMode default inference in the other review comment for the gap in that - if we infer null to mean Provisioned mode then this goes away)

Decreasing provisioning levels is rate-limited (once per hour after the first 4 in a day) and will throw a Amazon.DynamoDBv2.Model.LimitExceededException if exceeded.

Will add one liner descriptions

The upshot is I'm not sure how useful/usable it is to update provisioning on an existing table as part of table initialisation - ie ProvisionTableAsync. It’s highly likely that this method will throw (especially in a concurrent environment), but it’s also likely the provisioning exception is not critical and could be caught and logged/ignored - as opposed to a create/verify exception which will normally be fatal. IMO it’s going to be better to leave this decision to the consumer via an InitializeTableAsync/try UpdateTableAsync pair. I think it’s also worth adding some doc comments to UpdateTableAsync re the exceptions.

For me the UpdateTable is what it is - a light wrapper over the existing API with its foibles intact (complaining about null updates to throughput spec etc that there is no easy way to circumvent without dangerous try/catches to selectively ignore etc) - you're doing a blind write without reference to a DescribeTableResponse. I can't picture a use case for it tbh, especially without having a DescribeTable API (which is not hard to do yourself, but for that to be truly useful, we might need to expose the requiresUpdate / applyToUpdateRequest API calls)

The high level problem is I cant see a good way to define an API that solves

my need: Make sure indexes, streaming and provisioning meet some defined need (the actual need I have for eqx initAws) (including a handy way to change the provisioning as a one stop shop), without a redundant and/or explicit Describe call that CreateIfNotExists already does
random other initialization/provisioning/updating requirements that others ma have: don't really know the other use cases .... so was thinking the best thing was to punt on them

Any real answer to these probably means doing a more complete job (which was something I was trying to avoid at the outset, but probably makes sense in terms of what this has become), i.e have two phases:

CreateTableIfNotExistsAsync / DescribeTableAsync / VerifyTableAsync get you a description
- Create/Verify includes the existing describe loop
- do we/how do we provide a Verify and/or Create without you having to Async.Ignore / discard the result?
UpdateTableIfRequiredAsync(?throughput, ?streaming, ?customize : TableDescription -> UpdateTableRequest -> bool, ?desc : TableDescription)
- Maybe it can be called something like Provision
- if no desc supplied, it could call DescribeTable for you to get it (should it also be expose separately? I assume it should not do a (unbounded) wait loop?)
- If customize returns false, and neither throughput or streaming generate a change requirement, no call happens

There's a strong argument for removing UpdateTableAsync if we do the above.

As it would remove the need for a ProvisionTable, that makes the Initialize/Verify/Provision concept go away)

(aside: the use of Async suffices for Async methods is no longer the norm in F#, so ideally we would not have them, but then it becomes inconsistent - maybe a future V2 can drop the scripting stuff and the suffices?)

Initialising a context via constructor then VerifyAsync/InitialiseAsync definitely feels more painful to use from code (for people currently using those options) - I guess the gamble is the majority of people also found it as useless as I have in a real-world application. Hopefully the comments on the obsolete Create make the trade-offs clear.

Well for a start the two phases means we don't have metricsCollector and/or other such things leaking into factory methods. It also meant defaulting of RU limits etc had to be mentioned in lots of places in the xmldoc.

I know I complained about ambiguous use of the term 'Create' but I'm wondering if CreateTableAsync may be clearer regarding its behaviour than InitializeTableAsync (particularly as it’s no longer a factory)? I suspect people would be fine with the semantics of it being 'create if needed'. Happy to hear your opinions there.

For me the value of Initialize/Verify/Provision is that they are single words that don't sound like any of the underlying DDB APIs - for me that's in keeping with the fact that they have semantics on top of the basic calls they do. If it was called Create/CreateTable, there'd thus be an argument to call it CreateTableIfNotExists to signal the different semantics. The other thing thing here is that, unlike Provision, it will only apply your settings if the Create is happening, and won't apply the throughput (or streaming) config otherwise (which nudges me toward calling it CreateTableIfNotExists). I guess this translates to me saying I dont think [CreateAsync or] CreateTableAsync is a better name than Initialize, but maybe CreateTableIfNotExists is if you like it?

I'm going to spike:

having CreateTableIfRequiredAsync / VerifyTableAsync return the TableDescription
replacing ProvisionTableAsync and UpdateTableAsync with UpdateTableIfRequiredAsync

But will also be watching out for your reactions to any of the above as I do that.

Another question: Based on the above naming, should Scripting.TableContext.Initialize split into Scripting.TableContext.Verify + Scripting.TableContext.CreateIfNotExists as that term is no longer in use?

samritchie commented 2 years ago

Hmm, I'm not overly enthusiastic about passing around TableDescriptions, I'm a bit concerned about API surface area and complexity for something that I suspect is only relevant to a small number of consumers - "I can’t run because you have too much table capacity" is a fairly unusual requirement. It would make more sense to me to just change the internals of UpdateTableAsync so that it updates if needed - I'm aware this will result in a redundant Describe call unless the TableDescription is cached in the context.

I could live with CreateTableIfRequired, but I think CreateTable is better and I don't think the fact it doesn’t create an existing table is terribly surprising.

Unsure about scripting, I think it’s important to keep fairly ergonomic one-liners but I'm not sure what it/they should be called. I guess it makes sense to align with the members, alternatively maybe something closer to the existing single Create factory.

bartelink commented 2 years ago

I could live with CreateTableIfRequired, but I think CreateTable is better and I don't think the fact it doesn’t create an existing table is terribly surprising.

Pushed a rename of InitializeTableAsync to CreateTableIfNotExistsAsync - not sure I like the name as it also does an implicit verify of the table schema. At present it does not return a TableDescription...

note sure if CreateTable is better as it has an implicit verify? maybe CreateOrVerifyTable ? (this quandary is what got me to Initialize...)

Hmm, I'm not overly enthusiastic about passing around TableDescriptions, I'm a bit concerned about API surface area and complexity for something that I suspect is only relevant to a small number of consumers - "I can’t run because you have too much table capacity" is a fairly unusual requirement. It would make more sense to me to just change the internals of UpdateTableAsync so that it updates if needed - I'm aware this will result in a redundant Describe call unless the TableDescription is cached in the context.

The immediate update is only for me and my provisioning tool, and I'm conscious of polluting the normal API. I can either

make InternalCreateOrValidate public and call it VerifyOrCreateTableAsync
expose Provisioning.createOrValidate I'm thinking the latter as I demonstrate it in the Script.fsx - we can review after

"I can’t run because you have too much table capacity" is a fairly unusual requirement

I think that's a bit of a strawman.

The intent here is to be able to provide an one-stop-shop Equinox CLI tool which can create a table (with correct schema and/or streaming config), and lets you flip between OnDemand to Provisioned. The only reason it gets into RU adjustments is as you cant create a table without it (unless one defaulted to OnDemand). While one would expect any real prod system to have other ways e.g. Pulumi or AWS tooling to adjust provisioning, playing with the settings via a CLI is something one might do during experimentation and/or load testing phases.

I'm about to do the UpdateIfRequired change now - will do a single built in Describe without retries. Am thinking to remove the current UpdateTable and not to expose a public Describe - thoughts?

Unsure about scripting, I think it’s important to keep fairly ergonomic one-liners but I'm not sure what it/they should be called. I guess it makes sense to align with the members, alternatively maybe something closer to the existing single Create factory.

I guess step one is to get the names to make sense for the Async/top level API and then worry about that. The problems with Create for me were:

confusing semantics - hiding the cost /existence of the Verify
providing a "yeah feel free to create it, I dont really think/care about provisioning"
having to tunnel through all the ctor args e.g. metricsCollector
having to put in defaulting or make things noisy when you are doing a Create

At one point I had a DU to define the modes, i.e. Verify | Create of ProvisionedThroughput, but that's no oil painting. I think the current semantics make sense (complain if it does not exist or create if the throughput was supplied) - in a scripting case this gives decent feedback and you can just retry with the throughput spec having thought about it

samritchie commented 2 years ago

note sure if CreateTable is better as it has an implicit verify? maybe CreateOrVerifyTable ? (this quandary is what got me to Initialize...)

Yes, maybe VerifyOrCreateAsync sums it up best. Initialize is a bit less clear as to what it does and will require looking at comments.

I think that's a bit of a strawman.

Sorry, yes, I should have said that failed provisioning updates will rarely indicate a table that’s not usable by the application, and I think for 99% of cases would warrant a warning only. This is why I'm trying hard to ensure it’s not included implicitly as part of any initialisation, I think there’s a very high likelihood that injudicious use of something like the ProvisionTableAsync model would end up taking down someone’s server instances.

The intent here is to be able to provide an one-stop-shop Equinox CLI tool which can create a table (with correct schema and/or streaming config), and lets you flip between OnDemand to Provisioned.

I guess this is driving the mismatch, I don’t think the library was ever really intended for use as a provisioning tool - as I’d said previously I think table creation was only included as a convenience.

I'm about to do the UpdateIfRequired change now - will do a single built in Describe without retries. Am thinking to remove the current UpdateTable and not to expose a public Describe - thoughts?

Yes, that sounds good.

Re: Scripting, I think what’s bothering me is trying to mirror the initialisation instance methods as static factory methods without making it clear that’s what they are. I'm not sure I have a good solution though - TableContext<'a>.ContextWithCreateOrVerify feels very cumbersome?

bartelink commented 2 years ago

note sure if CreateTable is better as it has an implicit verify? maybe CreateOrVerifyTable ? (this quandary is what got me to Initialize...)

Yes, maybe VerifyOrCreateAsync sums it up best. Initialize is a bit less clear as to what it does and will require looking at comments.

It's now VerifyTableAsync and VerifyOrCreateTableAsync - would be ok with dropping the TableAsync bits if you want

Sorry, yes, I should have said that failed provisioning updates will rarely indicate a table that’s not usable by the application, and I think for 99% of cases would warrant a warning only. This is why I'm trying hard to ensure it’s not included implicitly as part of any initialisation, I think there’s a very high likelihood that injudicious use of something like the ProvisionTableAsync model would end up taking down someone’s server instances.

Have pushed a change that replaces Provision with UpdateTableIfRequiredAsync. I've also let go of the desire to reuse the TableDescription from the Create/Verify - in practice if you are interested in adjusting config in a given context, you can afford a roundtrip (though you can supply the value if you have it) - the eqx initAws API will let it do that.

The intent here is to be able to provide an one-stop-shop Equinox CLI tool which can create a table (with correct schema and/or streaming config), and lets you flip between OnDemand to Provisioned.

I guess this is driving the mismatch, I don’t think the library was ever really intended for use as a provisioning tool - as I’d said previously I think table creation was only included as a convenience.

Yeah, though a relatively proven retry loop that can serve: a) the test rig (and anyone else's) b) eqx initAws

Is for me a very valuable thing (which is why I've spent the time!).

I'm about to do the UpdateIfRequired change now - will do a single built in Describe without retries. Am thinking to remove the current UpdateTable and not to expose a public Describe - thoughts?

One interesting/debatable thing is to make a Provisioning.tryDescribe - UpdateIfRequiredAsync will throw a specific InvalidOperationException if the state of the table is not Active. Normal usage should not trip on this as I assume people will guard app startup and/or reprovisioning with either a VerifyTable or VerifyOrCreateTable call, which each wait for state.

Re: Scripting, I think what’s bothering me is trying to mirror the initialisation instance methods as static factory methods without making it clear that’s what they are. I'm not sure I have a good solution though - TableContext<'a>.ContextWithCreateOrVerify feels very cumbersome?

Based on the rest of the names, maybe replacing Initialize with

Scripting.TableContext.Create (ctor + VerifyOrCreate)
Scripting.TableContext.Verify (ctor + Verify)

Might make sense. I think I'd leave off the metricsCollector to reduce the noise; let them drop to the ctor if they need it.

But that would still leave me with the following concerns:

TableContext.Create does not sound to me like it will async create (or validate) a table. It sounds like an in-memory/lazy creation of a Context.
people porting from Create will first be annoyed that it is forcing you to mention RUs (and Verify is not going to be an obvious thing to look for) and/or confused and will likely not get the right nudges about separating provisioning and/or validation from the mechanics of 🙈 ops will worry about that
If you open Scripting, then you have more noise:
- Create vs the ctor (if I see a ctor I might not look for a Create)
- Verify vs (admittedly non static) VerifyAsync and VerifyOrCreate

(these concerns are slightly reduced if we stick with the TableAsync suffices)

All in all I'm pretty happy with the outcome as it stands - thanks for the excellent review feedback that led to it.

Let me know if you want to flip any naming etc. I've spent too long looking at that xmldoc to be able to meaningfully review it for copy-editing now though! (have also update the overview, but similarly it may have missing changes from the refactoring)

bartelink commented 2 years ago

Right, have implemented as above - let me know your thoughts on how to proceed on the open conversations and I'll get to them asap...

bartelink commented 2 years ago

Apologies for the confusion re those breaking changes (which were intentional on my part, even if in retrospect pretty ill-judged 😊) @samritchie

I've integrated with 0.10.1-beta in https://github.com/jet/equinox/pull/321/commits/dcbc03681c54ec5433125e09c936dab2630aca8c and it all seems happy - thanks for getting it out there!

Arguably https://github.com/fsprojects/FSharp.AWS.DynamoDB/issues/35 should be re-opened? (I had intended to remove the resolves ref to it in the overview of this PR after our discussion there regarding the need to provide mapping for epoch times re TTL)

fsprojects / FSharp.AWS.DynamoDB

Clarify ctor/Creation/Verification lifecycle #43