Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.1k stars 1.21k forks source link

[storage] Blob Change Feed Client returns empty continuation token for final page in iterator #14229

Closed chrnola closed 8 months ago

chrnola commented 3 years ago

Describe the bug

When iterating the PagedAsyncIterableIterator<BlobChangeFeedEvent, BlobChangeFeedEventPage> returned by BlobChangeFeedClient.listChanges().byPage() the SDK returns "" as the value for continuationToken when yielding the final page in the iteration. This makes it difficult to write applications that attempt to treat the change feed as an infinite sequence. See "additional context" section below for more info on my particular use case.

I suspect the cause of the bug is this check: https://github.com/Azure/azure-sdk-for-js/blob/c52af93c7c718f8aec19630568bfd7c3d7798120/sdk/storage/storage-blob-changefeed/src/BlobChangeFeedClient.ts#L225-L227

which only sets the continuationToken if the wrapped instance of ChangeFeed reports that it has remaining changes.

To Reproduce

Iterate a blob change feed, by page, to the end of the change feed and observe that the final page in the sequence has no continuation token. Or, more specifically, an empty string is returned instead of the normal JSON-serialized continuation token:

const changeFeedClient = new BlobChangeFeedClient(...);

const pageSettings: PageSettings = {
    maxPageSize: 20,
};

for await (const page of changeFeedClient.listChanges().byPage(pageSettings)) {
    console.log(`Got page, events=${page.events.length}, token=${page.continuationToken}`);
}
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=4, token=

Note that the .NET SDK behaves differently and always returns a continuation token, even for the final page:

var client = new BlobChangeFeedClient(...);
await foreach (var page in client.GetChangesAsync(token).AsPages(pageSizeHint: 20))
{
    Console.WriteLine($"Got page, events={page.Values.Count}, token={page.ContinuationToken}");
}           
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=20, token={"CursorVersion":1,"UrlHost":...}
Got page, events=4, token={"CursorVersion":1,"UrlHost":...}

Expected behavior

As a consumer of the blob change feed SDK, I expect the SDK to provide me with a mechanism for restoring the state of a prior iteration, even if that iteration reached the end of the change feed. In other words, I don't think it is valid for the SDK to return an empty continuation token when an iterator reaches the end of the change feed.

Screenshots

N/A

Additional context

I am attempting to use the SDK to create an application that continuously reads the change feed and aims to process all of the events from the change feed as soon as they are made available. When it reaches the end of the change feed, it waits a little while, and then attempts to iterate the change feed again using the most recently persisted continuation token.

Therefore when the SDK returns an empty token on the last page it means that my application actually has to persist the token from the second to last page instead. As a consequence, my application code now needs to handle de-duplicating change feed events since there's always a risk that I'll receive events from the final page multiple times. This problem would be avoided entirely if instead I could receive a continuation token from the SDK on the final page of change feed events.

Here's a simplified excerpt of my application code:

import { PageSettings } from "@azure/core-paging";
import { BlobChangeFeedClient } from "@azure/storage-blob-changefeed";

const changeFeedClient = new BlobChangeFeedClient(...);

while (true) {
    const prevContinuationToken = await getTokenFromDatabase();

    const pageSettings: PageSettings = {
        continuationToken: prevContinuationToken,
        maxPageSize: 100,
    };

    for await (const page of changeFeedClient.listChanges().byPage(pageSettings)) {
        // Process `page.events` ...
        if (page.continuationToken !== "") { // I would really like to be able to remove this check and not have to check for duplicate events
            await saveTokenToDatabase(page.continuationToken);
        }
    }

    sleep(60000);
}
ljian3377 commented 3 years ago

Hmm, I agree with you. We could return the continuationToken even if it's ended for now. Thanks for the feedback. Will try to fix it.

ljian3377 commented 3 years ago

Also, are you comfortable with using the alpha i.e. daily build versions? We can get the fixes into master branch ASAP, but the release may come later.

chrnola commented 3 years ago

@ljian3377 yes I'm definitely comfortable with using the daily builds. Thanks for picking this up so quickly!

chrnola commented 3 years ago

Apologies for accidentally closing. 🤦‍♂️ This issue is definitely still relevant.

EmmaZhu commented 2 years ago

https://github.com/Azure/azure-sdk-for-js/pull/20079

ghost commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details
- **Package Name**: @azure/storage-blob-changefeed - **Package Version**: 12.0.0-alpha.20210210.1 - **Operating system**: Linux (Debian) - [x] **nodejs** - **version**: 12.13.0 - [ ] **browser** - **name/version**: - [x] **typescript** - **version**: 3.9.8 - Is the bug related to **documentation** in - [ ] README.md - [ ] source code documentation - [ ] SDK API docs on https://docs.microsoft.com **Describe the bug** When iterating the `PagedAsyncIterableIterator` returned by `BlobChangeFeedClient.listChanges().byPage()` the SDK returns `""` as the value for `continuationToken` when yielding the final page in the iteration. This makes it difficult to write applications that attempt to treat the change feed as an infinite sequence. See "additional context" section below for more info on my particular use case. I suspect the cause of the bug is this check: https://github.com/Azure/azure-sdk-for-js/blob/c52af93c7c718f8aec19630568bfd7c3d7798120/sdk/storage/storage-blob-changefeed/src/BlobChangeFeedClient.ts#L225-L227 which only sets the `continuationToken` if the wrapped instance of `ChangeFeed` reports that it has remaining changes. **To Reproduce** Iterate a blob change feed, by page, to the end of the change feed and observe that the final page in the sequence has no continuation token. Or, more specifically, an empty string is returned instead of the normal JSON-serialized continuation token: ```typescript const changeFeedClient = new BlobChangeFeedClient(...); const pageSettings: PageSettings = { maxPageSize: 20, }; for await (const page of changeFeedClient.listChanges().byPage(pageSettings)) { console.log(`Got page, events=${page.events.length}, token=${page.continuationToken}`); } ``` ``` Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=4, token= ``` Note that the .NET SDK behaves differently and _always_ returns a continuation token, even for the final page: ```csharp var client = new BlobChangeFeedClient(...); await foreach (var page in client.GetChangesAsync(token).AsPages(pageSizeHint: 20)) { Console.WriteLine($"Got page, events={page.Values.Count}, token={page.ContinuationToken}"); } ``` ``` Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=20, token={"CursorVersion":1,"UrlHost":...} Got page, events=4, token={"CursorVersion":1,"UrlHost":...} ``` **Expected behavior** As a consumer of the blob change feed SDK, I expect the SDK to provide me with a mechanism for restoring the state of a prior iteration, even if that iteration reached the end of the change feed. In other words, I don't think it is valid for the SDK to return an empty continuation token when an iterator reaches the end of the change feed. **Screenshots** N/A **Additional context** I am attempting to use the SDK to create an application that continuously reads the change feed and aims to process all of the events from the change feed as soon as they are made available. When it reaches the end of the change feed, it waits a little while, and then attempts to iterate the change feed again using the most recently persisted continuation token. Therefore when the SDK returns an empty token on the last page it means that my application actually has to persist the token from the _second to last_ page instead. As a consequence, my application code now needs to handle de-duplicating change feed events since there's always a risk that I'll receive events from the final page multiple times. This problem would be avoided entirely if instead I could receive a continuation token from the SDK on the final page of change feed events. Here's a simplified excerpt of my application code: ```typescript import { PageSettings } from "@azure/core-paging"; import { BlobChangeFeedClient } from "@azure/storage-blob-changefeed"; const changeFeedClient = new BlobChangeFeedClient(...); while (true) { const prevContinuationToken = await getTokenFromDatabase(); const pageSettings: PageSettings = { continuationToken: prevContinuationToken, maxPageSize: 100, }; for await (const page of changeFeedClient.listChanges().byPage(pageSettings)) { // Process `page.events` ... if (page.continuationToken !== "") { // I would really like to be able to remove this check and not have to check for duplicate events await saveTokenToDatabase(page.continuationToken); } } sleep(60000); } ```
Author: chrnola
Assignees: EmmaZhu
Labels: `bug`, `customer-reported`, `Client`, `Storage`, `Service Attention`, `needs-team-attention`
Milestone: -
github-actions[bot] commented 8 months ago

Hi @chrnola, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.