Return per-item iterator for page operation

tadelesh commented 2 years ago

We'd like to have item iterator support for page operation. Other languages' SDK (e.g., .Net/JS) either use a self-defined iterator or language-supported iterator to support both iterate on items and pages. This will help customer to handle list result more friendly.

jhendrixMSFT commented 2 years ago

We could provide an iterator wrapper for *runtime.Pager[T any]. However, it would require adding a method to the paged response envelope to satisfy a type constraint that allows access to the page of items. The wrapping code would look as follows.

type PageConstraint[TItem any] interface {
    Items() []*TItem
}

type Iterator[TItem any, TPage PageConstraint[TItem]] struct {
    pager *runtime.Pager[TPage]
    cur   []*TItem
    index int
}

func (iter *Iterator[TItem, TPage]) More() bool {
    return iter.pager.More() || iter.index < len(iter.cur)
}

func (iter *Iterator[TItem, TPage]) NextItem(ctx context.Context) (*TItem, error) {
    if iter.index == len(iter.cur) && !iter.pager.More() {
        return nil, errors.New("no more items")
    }
    if iter.cur == nil || iter.index == len(iter.cur) {
        // first page or page exhausted
        page, err := iter.pager.NextPage(ctx)
        if err != nil {
            return nil, err
        }
        iter.cur = page.Items()
        iter.index = 0
    }
    item := iter.cur[iter.index]
    // advance item
    iter.index++
    return item, nil
}

func NewIterator[TItem any, TPage PageConstraint[TItem]](pager *runtime.Pager[TPage]) *Iterator[TItem, TPage] {
    return &Iterator[TItem, TPage]{
        pager: pager,
    }
}

We then add the Items() method to the response envelopes.

func (p PagingClientGetMultiplePagesResponse) Items() []*Product {
    return p.Values
}

Which then allows construction of the iterator.

func TestIterateGetMultiplePages(t *testing.T) {
    client := newPagingClient()
    iter := NewIterator[Product](client.NewGetMultiplePagesPager(nil))
    count := 0
    for iter.More() {
        item, err := iter.NextItem(context.Background())
        require.NoError(t, err)
        require.NotZero(t, item)
        fmt.Println(*item.Properties.Name)
        count++
    }
    if r := cmp.Diff(count, 10); r != "" {
        t.Fatal(r)
    }
    if _, err := iter.NextItem(context.Background()); err == nil {
        t.Fatal("unexpected nil error")
    }
}

JeffreyRichter commented 2 years ago

I'm opposed to having some kind of item iterator in our Go SDKs (we also don't do it in C & C++). There are multiple problems with an item iterator:

It is unclear when I/O operations are occurring and so customer code can't be aware of when long latencies may occur in the loop.
Because the I/Os are "hidden", there is no clean way to specify a context for timeout to prevent the app from hanging indefinitely.
Because the I/Os are "hidden", there is no clean way of doing error recovery should the I/O fail. Even if you could do some error recovery, it is exceedingly difficult to somehow pick up from where you left off since you don't know where you are in the iteration.
When paging through a distributed collection of resources being modified by multiple clients while paging, items may be skipped, and some items may be returned multiple times. An iterator gives the impression that this cannot happen, but it can. Customer code must be written to be tolerant of this, which they are more likely to do WITHOUT an iterator.
Some Azure operations return multiple collections for each page. For example, Azure Files returns a page with some directories and some files. These are heterogeneous data types and so the page REQUIRES 2 loops to walk through the 2 sets of item types.
Go (like C++) is a close-to-the-metal language and our SDK should not do magical conveniences under the covers like some other higher-level languages (like C#, Python) do. Azure Kubernetes Service engineers have specifically told us that they can't use our SDKs if they can't control when I/O & latencies occur.
Go doesn't have an iterator language feature (unlike C#) and this is because Go's philosophy/culture is to be close-to-the-metal and not do magical conveniences. Go code should be easy for engineers to look at to understand ALL that is happening. To be a part of this culture, our Go SDK does not offer many of the "conveniences" that some other languages do - this is by-design.

tadelesh commented 2 years ago

Can we just add Items() method as Joel's solution to reserve possibility to have a decorator to do so?

jhendrixMSFT commented 2 years ago

Because the I/Os are "hidden", there is no clean way to specify a context for timeout to prevent the app from hanging indefinitely.

A context is specified when fetching the next item from the iterator. We do this same thing in track 1 and it's worked great.

Because the I/Os are "hidden", there is no clean way of doing error recovery should the I/O fail. Even if you could do some error recovery, it is exceedingly difficult to somehow pick up from where you left off since you don't know where you are in the iteration.

If the I/O operation fails when fetching the next page, the iterator's state isn't updated.

When paging through a distributed collection of resources being modified by multiple clients while paging, items may be skipped, and some items may be returned multiple times.

I think this is true for paged operations in general. Also, this assumes that all operations are distributed with multiple clients operating on the collection which isn't true in all cases.

Some Azure operations return multiple collections for each page.

Agreed this is a problem. However, I believe this is uncommon.

Go (like C++) is a close-to-the-metal language

Agreed and is why I suggest we make this available as an add-on, and not part of the generated API surface area, so customers can make their own choice based on their problem domain.

Azure / autorest.go

Return per-item iterator for page operation #812