Excel - Slow performance when writing data into Table

LanceEa commented 7 years ago

Summary

I'm creating a table in Excel using the new Promise based API's and inserting data into it and the performance isn't what I would have expected.

const t0 = performance.now();

// add new table, set name and headers
 myTable = sheet.tables.add(tableRange, true);
 myTable.name = tableName;
 myTable.getHeaderRowRange().values = [headers];

// add data if exists
if (data.length > 0) {
 myTable.rows.add(null, data);
}
// activate sheet
sheet.activate();

await context.sync()
 .then(() => {
      const t1 = performance.now();
      console.log(`Took ${(t1 - t0).toFixed(4)} milliseconds to generate: ${tableName}`);
});

Expected Behavior

I would expect that 2000 records would be fast and only take 1 second or so and that I would be able to insert significantly more data. We are in the early phases of developing our add-in so we are using small datasets but in the future we want to support 100's of thousands of records. I recognize to insert that many records we will need to probably chunk it up and insert in batches but I would think those batches could be bigger than 2000 records???

Current Behavior

When I create a table and insert 100 records it is snappy and feels fast. Less than 1 second. However, when I load 2,000 records and 20 columns it is much slower. At worst I have seen 20 seconds and at best like 5 seconds.

Excel for Mac (Fast Insider 15.39): 13-20 seconds
Excel Online (Chrome 60): 5-8 seconds
Excel Online (Safari 10.1.2): 5-8 seconds
Excel 2016 for Windows (Version 1710 - Build 8530.1000) - 6-8 seconds

Steps to Reproduce, or Live Example

See code snippet above. It is basically the same as what is provided in ScripLab but the data is just 2000 records x 20 columns

Context

As mentioned above, I would expect 2000 records x 20 columns to be faster than it is because I want to be able to allow users to download large datasets (100,000 records+). Assuming, we can provide a good user experience.

Your Environment

Platform: Mac, Office Online
Host: Excel
Office version number: See above because I tested on multiple versions/browsers
Operating System: MacOS Sierra, Windows 10
Browser: Chrome 60 & Safari 10

Useful logs

I'm adding some Application Insight events right now and will update issue with some performance observations.

Questions:

Is my code correct or is there a better way to be doing this?
Are my expectations correct and inline with what I should be expecting?
What is a good batch size for inserting data into a table? Records, columns or by number of cells?

dmahugh commented 5 years ago

Just noticed this issue was never closed. Closing because the original perf issue has been addressed, but let me know if anyone feels it needs to be re-opened.

megharajdeepak commented 4 years ago

Hi All, @misaunde @Zlatkovsky @LanceEa @sameera @deinspanjer @tong-1324 @JuaneloJuanelo @Hongbo-Miao @Danwakeem

I am using "@microsoft/office-js": "^1.1.43" within Angular CLI project ("@angular/common": "~10.2.0").

Before this post, I was using straight forward way ie., dataTable.rows.add(rows) and this was working fine only for up to 10k records, but it was time consuming, but wasn't crashing. For the output with 50k records, addin was just crashing with message saying "Sorry, we had to restart because this add-in wasn't responding.".

I have gone through all the steps mentioned above and finally, tried untrack() method using the stack post shared by @misaunde @Zlatkovsky : https://stackoverflow.com/q/51735674/3806701

Using that technique, excel addin now runs and works fine(without crashing) up to records of 57k (around 20 columns). This is the plain output that I get from an API. There is no formatting involved nor are there any excel styling or formulae related chores.

However, there are few reports which would go up to 150k. With same technique as above, I tried to execute 97k rows, unfortunately, addin crashes with the same message that it isn't responding.

I see that this post was created long ago, I was just wondering if there are any updates or I need to try something different so that addin do not crash(it's ok if it takes time, but it shouldn't crash)

Following is the code I currently use:

createTable3() {
    let rows; //filled from API service length ranges from 5k to 150k
    let cols;//filled from API service length ranges from 3 to 25

    Excel.run(async (context) => {
      const sheet = context.workbook.worksheets.add('MiReport');

      console.log('Creating table...');
      const sheet = context.workbook.worksheets.getActiveWorksheet();
      sheet.getRange().clear();
      const range = sheet.getRange("A1").getResizedRange(0, cols.length - 1);
      sheet.tables.add(range, false).set({ name: "myTable" });
        const table = sheet.tables.getItem("myTable");
                table.getHeaderRowRange().values = [columns];
                const rangeToWriteTo = table.getDataBodyRange().untrack()
                    .getCell(0, 0).untrack().getResizedRange(rows.length - 1, columns.length - 1).untrack();
                const fullTableRange = table.getRange().untrack().getCell(0, 0).untrack().getBoundingRect(rangeToWriteTo).untrack();
                table.convertToRange();
                rangeToWriteTo.values = rows;
                sheet.tables.add(fullTableRange, true /*hasHeaders*/);

                console.log('Created table...');

  }).catch( er => console.log('Exce.run() error', er));
  }

megharajdeepak commented 4 years ago

Hi @LanceEa Were you able to scale it to render 100k rows or you had to limit the number of rows?

LanceEa commented 4 years ago

@megharajdeepak - We have users that download 750k rows x 60+ columns and it can do this relatively quickly under 2 minutes pending column width and size.

Out add-in is in a stable place (mostly maintenance at this point) so its been awhile since I looked at the Excel API. But, I can outline some techniques we used and how we try to maximize the concurrent work.

Web Sockets - Allows us to stream the data over the network from the backend system as it is received rather than waiting for the whole json response to be serialized.
Protocol Buffers - Binary Wire format to reduce the repetitive nature that json provides.
Web Worker - manage web socket, handle request/response, decoding protocol buffers and buffering data until main UI thread is ready
Batch writing to Excel on the Main UI thread (we tested a bunch and settled on 500 rows per batch. we could probably try to make it smarter but sufficient for our needs).
Give user multiple download modes

Note: Web sockets were chosen because of browsers lack of support of fetch and readable stream on the response body which is necessary to get the same effect.

Download Modes

Clear existing Tables Data and Append Data (Slow) - This is the slowest but is necessary if a user has references to the table. If the table is dropped the references get corrupted so users that create formula's of the table use this mode.
Delete/Drop Existing Excel Table (Faster) - I found that just dropping existing tables/excel sheets is fastest
No Excel Table and only write data to Excel Range - This is by far the fastest so if a user doesn't need the table then do not create it. I have users that like to just download raw data and then do their own excel magic.

As for your code (its been awhile since I looked at he excel API), but it appears to me that you are not batch loading the data so you might try smaller batches. By using streaming, and buffering in the Worker Thread I can keep excel busy loading while processing off the main UI thread.

Second, it looks like you are generating the table and then converting it back to a rang . I would try first writing all data to a range within the sheet and then create the Table at the end after all the data has been loaded based on a range.

Other things you can try, are turning off calculations, sorting, or anything else that might increase the write time.

Cheers, Lance

megharajdeepak commented 4 years ago

Hi @LanceEa Thank you for comprehensive steps that seemed to have worked great for your use case. Before trying out web socket, web worker and batch processing, I wanted to give the 3rd download mode a try ie, No Excel Table and only write data to Excel Range.

After struggling for 2 days, I couldn't believe my eyes and it looked truly magical that the add-in stopped crashing, with a cherry on topping of excellent performance in terms of rendering! I tested for up to 300k rows and the render time was around 25seconds, which is much much better in terms of performance and most importantly, the add-in isn't crashing anymore, which was a show stopper for us. Honestly, I wanted to just dump the raw data, as-is, in the excel sheet(and users can take it from their to filter, sort and so on). Hence, table isn't really required for our use case. And I'd trade range with table any day given the gain that we get in terms of performance.

If the records further increase and if it appears slower in terms of user experience, I'd take the websocket\batching route which you have shared. Thank you so much once again :)

smaranneducations commented 6 months ago

This is what worked easily for 10 MB of data

`// Assuming import { Excel } from somewhere if needed

export const resetScenarioRecords = async (apiData, localContext, tableName) => { if (!apiData) { console.error('No data provided to reset scenario records'); return; }

await Excel.run(async (context) => {
  const sheet = context.workbook.worksheets.getItem(tableName);
  const table = sheet.tables.getItem(tableName);
  context.application.calculationMode = Excel.CalculationMode.manual;
  sheet.protection.unprotect('Welcome123!');
  await context.sync();

  table.getDataBodyRange().clear();
  table.resize("A5:I6");
  await context.sync();

  const newValues = apiData.map(item => [
    item.ScenarioCode,
    item.ScenarioOpen,
    item.ScenarioName,
    item.ScenarioDescription,
    item.UD1,
    item.UD2,
    item.UD3,
    item.DocAttachments,
    '=IF(COUNTIF([ScenarioName],[@ScenarioName])>1,"No","Yes")'
  ]);

  console.log('newValues', newValues.length);
  table.resize(`A5:I${newValues.length+5}`);
  console.log('newValues', newValues);
  /* sheet.getRange(`A5:I${newValues.length+6}`).values = newValues;  */

sheet.getRange(`A6:I${newValues.length+5}`).values = newValues;
  /* table.getBodyRange().values = newValues; */
  await context.sync();

  [0,2,7, 8].forEach(col => table.columns.getItemAt(col).getDataBodyRange().format.fill.color = "#FFBE33"); 

  sheet.protection.protect({ allowAutoFilter: true, allowSort: true }, 'Welcome123!');
  context.application.calculationMode = Excel.CalculationMode.automatic;
  await context.sync();
});

}; `

OfficeDev / office-js