Closed insasho closed 7 years ago
Thanks for contacting us @insasho with this feature request. I think adding the ability to stream paginations would be helpful to have.
I think the ideas you proposed in 1,2 would most likely flow pretty well with how the SDK would implement them. 4 takes advantage of a DSL, but like you mentioned its string based, which could cause a significant lose of type safety the other two methods would preserve. For idea 1 we'd need to figure out how to ensure ensure goroutines and channels would not be easily leaked. Because of that option 2 might be the strongest. Also thanks for the iterators link, I'll read that over
Idea 3 could actually build separate on top of 1 or 2 fairly straightforwardly I think. Providing both a streamer and an optional filter functionalities.
We're also always glad to review PRs and discuss designs for how this could be implemented.
As your team has probably already done, I've experimented with various approaches to using async channels for paginating results. I'll share some findings here.
The Golang blog post on Pipelines and Cancellation proffers the minimum requirements for correct application of channels to the creation of a pipeline. Applying that model to the KMS's ListAliases call results in something like this:
type AliasListEntryStream chan *kms.AliasListEntry
func (c *kms.KMS) ListAliasesStream(listAliasesInput *kms.ListAliasesInput, done <-chan struct{}) (AliasListEntryStream, <-chan error) {
out := make(chan *kms.AliasListEntry)
errc := make(chan error, 1)
go func() {
defer close(out)
errc <- c.ListAliasesPages(listAliasesInput, func(p *kms.ListAliasesOutput, _ bool) bool {
for _, v := range p.Aliases {
v := v
select {
case out <- v:
case <-done:
return false
}
}
return true
})
}()
return out, errc
}
This pattern is also compatible with the golang.org/x/net/context pattern as the done
acts as a cancellation signal which would allow the generalized timeout behavior.
However, this approach (whether using context or not), requires the user to write a few more lines of code and has higher cognitive burden than the existing pagination method which may not be worth the extra typing except in complex scenarios. Example:
done := make(chan struct{}) // user must create and may close this channel
defer close(done) // Pages method will block until done is closed, which could hold server resources
out, errc := client.ListAliasesStream(done) // user must pass done channel to all pipeline methods
for entry := range out {
if *entry.AliasName == "aws/aws/acm" {
return true // deferred method signals generated code to stop iterating
}
}
if err := <-errc; err != nil { // complex error handling
panic(err)
}
return false
This pattern does allow for some cute uses of channels (such as filters and tees and methods to materialize the collection). However, KMS' ListAliases may be similar to most of the paginated methods in that the size of the result set is small enough where the streaming nature does not yield results commensurate to the maintenance burden of including these methods in the core SDK. cloudwatchlogs.GetLogEvents
may be the counter-example due to it possibly returning millions of entries, but is atypical of AWS APIs in my experience.
I experimented with implementing this pipeline-style method in the SDK code generator, which revealed a few more issues:
result_key
fields. I'm unfamiliar with the iteration behavior of these APIs, but this could complicate the generated interface.result_key
field from the paginators-1.json file is not deserialized by paginators.go, perhaps because of the reason above.Thanks for information @insasho, and looking into this further. The AWS SDK for Go will use the paginators-1.json
file to determine how to generate paginators for their associated service client APIs. Currently the SDK does not use the result_key
but may need to if streaming pagination where to be used. Only the AWS CLI and AWS SDK for Python currently use this field. result_key
is complicated because it is not required to be a single field of the response. There are a few conditions where APIs' paginators have multiple fields are grouped or merged together. In the SDK this may require generating custom shapes for these multi field result_key
.
I think streamed paginators in the SDK probably would like very similar to what you described.
For the results_key
fields, correct the field is the name in the context of the output type of the API operation. For input/output tokens the SDK uses a bit of reflection to search for this field when processing if the request should be paged again. To do this for generated code with the results_key
the SDK would have to search the associated API's output type for the fields. This is generally available via a ShapeRef's Shape.MemberRefs field, which enumerates all fields of a struct. Context is a good pattern to follow, but the actual net.Context won't be introduced until 1.7 and if the SDK were to implement streams we'd want it to be compatible with at least 1.5 and 1.6 also.
There are several APIs where streams make a lot of sense, like the ones you mentioned, and others like dynamodb queries, and listing S3 objects.
Hi In PR #1132 I merged a change that adds a request.Pagination
type to the SDK. This type allows you to control pagination much more easily your self in a pattern similar to bufio.Scanner
.
The pagination methods are thin helpers around the underlying Input/Output types and it invokes the callback once per "page" of results rather than providing a stream of unknown length. This differs from Boto3's approach of providing a richer collection model for paginated results. The richer collection model allows many problems (especially filtering) to be handled in much fewer lines of code. aws-sdk-go could benefit from having a similar filterable-stream based API for results that can be paginated. This is useful for API calls that don't natively support filtering parameters but do support pagination (EC2 DescribeInstances does, KMS ListAliases does not, etc).
Some ideas:
Using Channels (async):
Filter builder: Use the API metadata to build a field-aware filter API: