Azure / azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.06k stars 1.19k forks source link

[@azure/data-tables] - [13.2.2] - TableClient listEntities byPage - undefined continuationToken #26006

Open ag-nhs opened 1 year ago

ag-nhs commented 1 year ago

PR: https://github.com/Azure/azure-sdk-for-js/pull/18179/files Main: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/tables/data-tables/samples/v13/typescript/src/usingContinuationToken.ts

Describe the bug When following the example in the PR/sample linked above, the continuation token is undefined, even though there are more records in the table.

To Reproduce Steps to reproduce the behavior:

  const tableClient = new TableClient('The URL', 'The Table', credential)
  const iterator = tableClient.listEntities<BatchesTableRow>().byPage()

  let interestingPage: string | undefined
  for await(const page of iterator){
    console.log(page)
    for await (const value of page.values()){
      records.push(value)
    }
    interestingPage = page.continuationToken
  }

  const page = await tableClient.listEntities().byPage({continuationToken: interestingPage}).next()

  if(!page.done){
    console.log(page.value[0])

    for(const entity of page.value){
      records.push(entity)
    }
  }

The console.logs shows that the first records in both calls are the same. The credential is an InteractiveBrowserCredential

Expected behavior The above code should load the next 1000 records.

Screenshots image

Additional context I am trying to accomplish being able to get more than 1000 records from a table. The table currently has around 1800 records.

xirzec commented 1 year ago

@joheredi do you know if there is a service limitation at play here?

xirzec commented 1 year ago

@ag-nhs I looked into this a bit more myself and I think there is some confusion in your above example code.

Since you are doing interestingPage = page.continuationToken when looping over all pages in the page iterator, you will overwrite the value of this variable with the continuationToken for each page, and of course the final page has an undefined continuationToken.

If all you want is to retrieve all entities in a table, you can do so directly with listEntities:

  for await(const entity of tableClient.listEntities()) {
    records.push(entity);
  }
  console.log(records.length);

Or you can iterate by pages if you prefer:

for await(const page of tableClient.listEntities().byPage()) {
    records.push(...page.values());
    console.log(page.length);
  }
  console.log(records.length);

There's really not a reason to use the continuation token unless you are doing something like retrieving a single page and then waiting to fetch the next one until some external event happens (e.g. you render a first page to the user and then wish to render the next page when they click a button.)

Does this help? Sorry for the delay in responding.

github-actions[bot] commented 1 year ago

Hi @ag-nhs. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ag-nhs commented 1 year ago

@xirzec I've used listEntities like you've suggested and only get 1000 records. I also sometimes get random number of records that I've ran into on some other tables. It's almost as if Azure is returning out before retrieving all records in the table if there are too many. The most I've seen is 1000, but the other variations I've seen are 750. To get around this I've limited the results, but I'd like to be able to have a nice table UI with filters and searches for ease-of-use for the end users.

I figured there was a limitation with using listEntities and saw that there was a continuation token process and used it, but it does not provide me with a token at all in any of the responses back from listEntities and byPage. I can almost get around this by storing the last partition key and row key and using a filter to with those values, but that got messy.

Our table has 5893, but the React app is only loading 594 of them. image

image

 const credential = new InteractiveBrowserCredential({
   tenantId: TENANT_ID,
   clientId: CLIENT_ID,
   authorityHost: 'https://login.microsoft.com'
 })

 const tableClient = new TableClient(URL, 'tableName', credential)

 const records: any[] = []
 for await (const entity of tableClient.listEntities()){
   records.push(entity)
 }
 console.log(records.length)
ag-nhs commented 1 year ago

I ran this using TableClient.fromConnectionString in a local node app and retrieved all of the records.

image

We do not want to implement hard-coded connection strings into a public facing web-app, so we are connecting using the TableClient along with credentials to ensure that the user has access in our subscription with proper roles.

import { TableClient } from '@azure/data-tables'
import { EnvVar } from '../common/common.entities'

const asyncFn = async () => {
  const tableClient = TableClient.fromConnectionString(process.env[EnvVar.DataStorage], 'theTable')

  const records: any[] = []
  for await (const entity of tableClient.listEntities()){
    records.push(entity)
  }
  console.log(records.length)
}
asyncFn()
xirzec commented 1 year ago

Interesting that it is working from Node. I did find some interesting documentation here:

https://learn.microsoft.com/en-us/rest/api/storageservices/query-timeout-and-pagination

Note that the total time allotted to the request for scheduling and processing the query is 30 seconds, including the five seconds for query execution.

It is possible for a query to return no results but to still return a continuation header.

The async iterator should take care of continuing to retry as long as it takes to fetch all entities though, so I'm not sure what would cause it to stop in the middle. I'd be curious to debug a repro if you can share some code that has the issue.

The authentication mechanism shouldn't affect which entities are returned unless you are using a SAS token that has a scoped access range: https://learn.microsoft.com/en-us/rest/api/storageservices/create-service-sas#specify-table-access-ranges

For your manual pagination scenario I'm curious what happens if you try to unwrap each page like so:

  const iterator = await tableClient.listEntities().byPage().next();
  if (iterator.done) {
    console.log("no entities in first page!");
    return;
  }
  const firstPage = iterator.value;
  console.log(firstPage.length);
  console.log(firstPage.continuationToken);

  const iterator2 = await tableClient.listEntities().byPage({ continuationToken: firstPage.continuationToken }).next();
  if (iterator2.done) {
    console.log("no entities in second page!");
    return;
  }
  const secondPage = iterator2.value;
  console.log(secondPage.length);
  console.log(secondPage.continuationToken);
ag-nhs commented 1 year ago

I used your exact code.

image

xirzec commented 1 year ago

ah, I suppose since the token is undefined it just re-fetched the first page again.

It's very odd that it's behaving differently in node than in your browser app. Is there any chance you could create a public repository with a small repro app that shows the problem?

ag-nhs commented 1 year ago

Will do, thanks!

ntziolis commented 9 months ago

Same issue when using SAS tokens (continuationToken always undefined in browser):

    const sasConnectionString = 'GENERATEINPORTAL';
    const tableName = 'INSERTYOURTABLENAMEHERE';

    const tableClient = new TableClient(
      sasConnectionString,
      tableName
    );

    const entities = tableClient .listEntities({
      queryOptions: {
      },
    });

    const pageSize = 1;

    for await (const page of entities.byPage({
      maxPageSize: pageSize,
    })) {
      console.log(page.continuationToken);
      for (let index = 0; index < page.length; index++) {
        const entity = page[index];
        yield entity;
      }

      if (!page.continuationToken) {
        break;
      }
    }
joheredi commented 9 months ago

@ntziolis which version of the @azure/data-tables are you using?

I'm trying latest with Azure Storage Tables and the continuation token is always set except the last page. Here's the code I'm using to test.

const entities = tableClient.listEntities();
    const pageSize = 1;

    let myData: string[] = [];

    for await (const page of entities.byPage({maxPageSize: pageSize})) {
      console.log(page.continuationToken);
      myData = [...myData, ...page.map((entity) => {
        return `${entity.name} - ${entity.area}`;
      })];
    }
ntziolis commented 9 months ago

I'm using 13.2.2 , please note:

Are you using a SAS token for auth? Maybe the issue is triggered by a combination of platform and auth type?

joheredi commented 9 months ago

@ntziolis I still can't reproduce, I'm running the function below in Chrome using @azure/data-tables 13.2.2 and have tried with all combinations of SAS token (CS, token and url+token) in all my attempts the continuation token is set until the last one.

async function query(): Promise<void>{

    const entities = tableClient.listEntities();
    const pageSize = 1;

    let myData: string[] = [];

    for await (const page of entities.byPage({maxPageSize: pageSize})) {
      console.log(page.continuationToken);
      myData = [...myData, ...page.map((entity) => {
        return `${entity.name} - ${entity.area}`;
      })];
    }
  }

Would you be able to share more code around your repro? Something I can paste and run locally to get the repro would help a lot.

jeremymeng commented 9 months ago

I wonder whether this is related to CORS settings. We've seen issue in storage SDK where some expected headers are not exposed by the CORS setting https://github.com/Azure/azure-sdk-for-js/issues/27913#issuecomment-1831515266.

If x-ms-continuation-nextpartitionkey is not coming back in the browser, we would return an undefined continuation token. @ntziolis could you please check the response headers in browser dev tool to see whether this header is there?