Closed LanceEa closed 5 years ago
Tested on Windows 10
Attempted 50k insert
Like I mentioned before we would like to support larger data volume.
So, to see if it was possible I tested pushing 50k records into table. It would crash Excel and never finish.
Chunking inserts into Table
My next steps are:
Hi @LanceEa thanks for the info. I am looking into this. I will try to reproduce first. If you can provide me a repo, that will help me save some time.
@Hongbo-Miao - Thanks!
Here is a Gist I created that can be used with Script Lab. I mocked data for 17 columns and 2000 records. It takes about 12 seconds which is in line with what I was seeing in my application.
https://gist.github.com/LanceEa/da13ce3f425941e43a6472c957765db6
@Hongbo-Miao - Also, I was able to test an alternative method today:
const t0 = performance.now();
// add new table, set name and headers
myTable = sheet.tables.add(tableRange, true);
myTable.name = tableName;
myTable.getHeaderRowRange().values = [headers];
// Adding data to the the RowCollection seems slow.
// But adding data to a Range within the table is much faster
myTable.getBodyRange().values = data;
// activate sheet
sheet.activate();
await context.sync()
.then(() => {
const t1 = performance.now();
console.log(`Took ${(t1 - t0).toFixed(4)} milliseconds to generate: ${tableName}`);
});
Doing it this way I was able to scale it all the way to 100k records with 20 columns and it took about 1 minute 30 seconds. 50k records was about 40-50 seconds.
This is better since it doesn't crash excel but I still feel as though that is slow...Are the timings I'm finding within Microsoft's expectations for the Office-Js API's? Is this to be expected?
Hi @LanceEa the gist is not complete. It ends at line 26854. Maybe create a repo on GitHub?
Sorry about that...looks like ScriptLab or Gist doesn't like such a big file :(.
Here is a quick sample I threw together. https://github.com/LanceEa/Excel-Table-Perf
Steps to run it:
npm install
ng serve --ssl
Example 1 is the way that Script Lab and the Office-js Docs show how to create and data to a table.
Example 2 is the way that I experimented with today and was able to get "ok" results. Still not great but able to create a large table without Excel crashing.
All the logic can be found in the app.component.ts
and the mocked data is in data.ts
.
Sorry if you are already familiar with the process. Didn't want to assume that you know angular.
Let me know if you have any other questions or need more information.
@lanceea, on a sidenote: feel free to file the bug re. Script Lab on the Script Lab repo. If you can include an attachment of the text file to repro the issue, that would be great.
Thanks!
@Zlatkovsky - No problem. I took a second look at it today and I do not think their is a bug but let me confirm with @Hongbo-Miao. If there is a bug I will file it in the Script Lab Repo.
@Hongbo-Miao - I took a second look at the Gist to see what the issue could be and your screenshot appears to be of the file from the Github page, correct? GitHub will truncate large files as you can see it does it here for me too:
I see the same thing you are seeing at the bottom:
Here is what it says at the top of the page though
Regardless, the Repo I posted has the same code but I wanted to rule out a bug for Script Lab 😃
Thanks, I reproduced. I will have a look for the code and what we can improve.
@Hongbo-Miao - I just wanted to check-in to see if there was anything else you needed from me?
This really is a show stopper for us so and my team needs to make a decision whether we stick with the Office Addin model or go back to VSTO.
Is this performance inline with what you would expect? Is there something wrong with my code and how I'm loading data into tables? Or, is there potentially a bug that can be addressed here?
Any information on this would be helpful and appreciated.
Thanks!
@LanceEa Personally, I don't think your code has issue. It is a performance issue from us. If you can let us know what you are trying to achieve, maybe we can figure out a way. I will leave a contact way here tomorrow, or you can direct message me through my twitter then I send you back my email.
@Hongbo-Miao - Sounds good. I sent you a follow request on Twitter. You can reach me on twitter at my twitter then we can DM. I can share more details on what we are trying to accomplish.
@Hongbo-Miao Has there been any resolution on this performance issue? We have a use case where we'd need to write tens of thousands of rows in a table. But, even at around 500 records, we are waiting around 2mins just for the append to the table to complete. We tried both table.rows.add method as well as simply trying to write to the range. While the latter seem slightly better, the time taken for the operation is just too long. We have formulas in the rows that we are writing; which could be adding to the problem. Any help on overcoming the problem is appreciated.
Hi @sameera , my name is Tong. I am a software engineer at Office Extestion Platform team, and I am currently investigating the performance issue you mentioned above.
Here's some quick workaround to improve the performance:
We've already made some progress on our investigation and have addressed some key issues that we are facing. I really appreicate your feedback. Please feel free to let me know if you have any other questions.
Hi @LanceEa , my name is Tong. I am a software engineer at Office Extestion Platform team, and I am currently investigating the performance issue you mentioned above.
According to my colleague Sudhi, you have alreadly tried the quick workarounds I introduced to the other customer above. I know that the API performance still has not reached your expectation yet. I want to let you know that we are actively working on it and we've already made some progress. I will let you know at the first time when we have an update on it. I readlly appreciate your feedback and help!
HI @tong-1324 Thank you for your response.
We already use application.suspendApiCalculationUntilNextSync
but, have not seen much improvement with. One reason could be that we are writing the data in smaller chunks 25-100. And maybe, because of that the impact of this suspension isn't that noticeable?
As I also mentioned, we tried writing to the Range object. The Range object wasn't significantly better than the Table.rows.add.
We are currently trying out another alternative: which is to use the TableBinding.addRowAsync method. This is working out very well for us at the moment (each append taking less than 1 second). We preferred this over writing to range.formulas because we also have a use case to place different formulas on different rows of the same column. If we do this inside a Table using rows.add or range.formulas, Excel would auto-copy the last formula (if I observed correctly) to the entire column. addRowAsync doesn't seem to have this (undesirable) behavior. It just is a bit weird to be using the old style async API.
I take back what I said about addRowAsync not being affected Calculated Columns behavior. It does behave the same apparently. While unrelated to the original issue @tong-1324 is there no way to turn off Calculated Columns? Perhaps if there was a way to turn off all such table behavior until we are done, that would benefit the write performance as well.
@tong-1324 - Thanks! Let me know if there is anything you need me to test.
@sameera Thanks for your feedback! Good to know that you have a temporal workaround on the perf issue.
To follow up on the questions your mention:
In our performance test result, tens of thousands of data write in a Range should be just one or two seconds. Writing the data in smaller chunks could be the reason in your case. There're some known performance issue when we break the data into multiple chunks, and we've already got some solution on that. Do you mind to share some snippet with me about your data chunking so that I could provide more details with you?
To my understanding, TableBinding.addRowAsync shoud be very similar to writing data into Table.getDataBodyRange().Values. I will investigate into it to see what's reason to casue the performance difference on these two methods.
For the formula copy within a column, I think you are right and that's a by degined feature of Excel table. I don't even know there's any way to turn off it in UI currently. Do you think if we keep the workbook in the munual caculation mode will sovle the problem for you?
@tong-1324
Yes, most probable that it's related to chunking. We basically get the data from a REST API and basically write what we get to the sheet. More on this below.
You could do that from Auto Correct options in the UI. Also the behavior if you do this manually is that, Excel doesn't apply the formula you enter across the column, if you enter it manually. While the feature is a by design, I cannot help but feel that there's a bug in here somewhere because of the way it behaves over the API. Can you please take a look at https://github.com/OfficeDev/office-js-docs/issues/1212 and the screengrab linked and share your thoughts on that thread perhaps?
As for the code snippets and some perf data:
Range.formulas Approach:
appendTable = <T extends TableReference>(table: T, data: any[][]): Common.Thennable<T> => {
if (!data.length) return OfficeExtension.Promise.resolve(table);
let perfCounter = Date.now();
return Excel.run(ctx => {
let xlTable = ctx.workbook.tables.getItem(table.id);
let lastRow = xlTable.getDataBodyRange().getLastRow();
lastRow.load("rowIndex");
return ctx.sync()
.then(() => {
let startRow = lastRow.rowIndex > 2 ? lastRow.rowIndex + 1 : 2;
let targetRangeAddr = RangeRef.from(startRow, 0, startRow + data.length - 1, data[0].length - 1).getRelativePath();
let targetRange = xlTable.worksheet.getRange(targetRangeAddr);
targetRange.formulas = data;
return ctx.sync();
}).then(() => {
let span = (Date.now() - perfCounter) / 1000;
this.log.info("Time taken for append (s): " + span);
return table;
});
}).catch(err => {
return this.rejectWithFormattedError(err);
});
}
addRowsAsync Approach:
appendTable = <T extends TableReference>(table: T, data: any[][]): Common.Thennable<T> => {
if (!data.length) return OfficeExtension.Promise.resolve(table);
let perfCounter = Date.now();
let d = this.$q.defer<T>();
Office.context.document.bindings.getByIdAsync(table.key, null, findBindingResult => {
if (findBindingResult.status === Office.AsyncResultStatus.Succeeded && findBindingResult.value) {
let binding: Office.TableBinding = findBindingResult.value;
binding.addRowsAsync(
data,
{ coercionType: Office.CoercionType.Table },
addRowsResult => {
if (addRowsResult.status === Office.AsyncResultStatus.Succeeded) {
let span = (Date.now() - perfCounter) / 1000;
this.log.info("Time taken for append (s): " + span);
d.resolve(table);
} else {
d.reject(addRowsResult.error);
}
}
);
} else {
d.reject(findBindingResult.error);
}
});
return d.promise
};
Writing 2627 rows in chunks. The chunks are differently sized depending on the data coming from the REST API. So comparing chunk to chunk on the same approach may not be useful. The times logged in seconds are (for selected chunks)
| Chunk # | Range Approach | Async Approach
---------------------------------------------
| batch 1 | 5.46 | 0.192
| batch 2 | 12.844 | 0.191
| batch 12 | 126.96 | 0.924
| batch 14 | 134.6 | 1.05
| batch 15 | 160.57 | 1.12
| batch 19 | 256.17 | 1.359
| batch 20 | 278.79 | 1.436
Hope this gives you enough information to troubleshoot
@sameera
Sorry for the late reply. I would like to let you know that we have made some great progress on the performance issue at the past two weeks.
We optimized some critical part in our API pipeline, which saved more than 80% of time on average for each range write operations. We also optimized the way we manage temeporal ranges, which should let you have a better experience when you are trying to chunking data. We are still in the middle of testing and verifying our changes. Most likely we will finalize this changes on December fork, which will be availble for insider fast at January, 2018. I will let you know once we have a confirm on that.
The way you do the data chunking looks good. However, even with all this changes mentioned above, I would still recommend you to get all the data first and write a large range at once, if it is possible. This will save a lot of network/API overhead, and will make sure you get the best experience.
I will start to investigate the issues on the table object like slow performance on talbe.add(), performance differences on addRowsAsync(), etc, once I finish the current work on range. I would also try to see if there's any way to turn off the calculated columns options and to see if that helps the performance.
Thank you very much for your feedback and help with us!
@LanceEa I would like to provide some updates to you about our recent work on the performance. We optimized some critical part in our API pipeline, which saved more than 80% of time on average for each range write operations. We also optimized the way we manage temeporal ranges, which should let you have a better experience when you are trying to chunking data. We are still in the middle of testing and verifying our changes. Most likely we will finalize this changes on December fork, which will be availble for insider fast at January, 2018. I will let you know once we have a confirm on that.
We are also working on creating a public dataset and script sample for benchmark, so that we could better compare the performance experience on the customers' side and our lab environment. And we could have better way to quantitatively describe our improvement work.
Hi Tong,
I was trying to write data using Office.select('bindings#' + namedRange, function (callback) {}) and with setDataAsync(). I have 5000 rows along 30 columns and 10+ columns based on formulas. It taking more than 1 min to write data in three different sheets. Is it because of the formula we writing in the excel causing the perf issue.
Please suggest!
Thanks!
@tong-1324 - Thanks for the update. Let me know once it is available and I will give it a try!
Has anyone else tried this in Excel online?
I have a similar situation where I am trying to load 60,000+ rows into an excel spreadsheet and when I run it in the Excel client for Mac it works alright but when I run it online it doesn't insert the data into the table on the spreadsheet at all.
@tong-1324 @Hongbo-Miao - I just wanted to check-in on this to see what the status of the fixes are? Have they been pushed into the insider-builds?
Have given the update in the email. Please reply here for more details @tong-1324 when you back.
@LanceEa & @Danwakeem & @malay2012 : In the meantime, I've found a workaround that seems to work quite well -- which involves removing the original table, writing in the values, and then re-creating the table back on top. The only issue I can think of is that if you have inter-column formulas within the table, that might cause havoc. But otherwise, please give it a try.
You can import this snippet into Script Lab, and then see both the repro behavior as well as the workaround: https://gist.github.com/Zlatkovsky/d36d977668afde97fd567746dffa5c2b
The salient bit of code is:
await Excel.run(async (context) => {
const sheet = context.workbook.worksheets.getActiveWorksheet();
const table = sheet.tables.getItem(TABLE_NAME);
const rangeToWriteTo = table.getDataBodyRange()
.getCell(0, 0).getResizedRange(ROWS - 1, COLUMNS - 1);
const fullTableRange = table.getRange().getCell(0, 0).getBoundingRect(rangeToWriteTo);
table.convertToRange();
rangeToWriteTo.values = createArray();
sheet.tables.add(fullTableRange, true /*hasHeaders*/);
});
We'll keep you posted once you no longer need the workaround.
@Zlatkovsky - Thanks for the snippet. That is actually how I'm doing it currently. It definitely is faster than adding/resetting the values of the table. But, once you start doing larger datasets it gets really slow. I have my add-in limited to 5,000 records for my early testers until we can test with the new fixes.
@Zlatkovsky I can also confirm @LanceEa observation. We've been doing the same in our product for the last year. We've seen that range <-> table conversion can take up to 3 minutes when the table/range spans 20-30,000 rows. (Our tables typically contain few formulas columns as well, which could also contribute to the latency). It's also quite unfeasible to buffer this amount of data in memory and write at once. So, we have to make multiple range<->table conversions.
Our scenario also has to account for the fact that we get our data in batches from a web service. And not being able to use inter-column formulas like you said, pivot tables/charts that are driven by the table data set breaking during downloads, and the latency itself, forced us to move out of this approach. We currently use binding.addRowsAsync with relatively better performance.
The range<->table conversion also leads to Excel freezing up (specially noticeable on the desktop) while the conversion happens and the formatting changes that happen during the conversion is visually "disturbing" as well :) Of course, all of these are noticeable problems once you start working with large datasets in the excess of several of thousands of rows.
Yep, I totally understand. The workaround is not meant to say that it should be a permanent solution. We are absolutely working on finding something we can do within the product itself. It’s just that there are actually several compounding issues that contribute to this, so we don’t have an ETA for a full solution yet.
My intent is for anyone new stumbling onto the thread to see one possible workaround (for a subset of cases), until we have a permanent solution in place.
I'm happy to say that we do have a solution to the issue. I don't have an ETA yet for when it will get to Insider Fast and to the general public, but I did just try the latest internal build on my laptop, and the result is much faster. E.g., the snippet that I sent above, which was taking ~4 seconds to execute, now executes in 0.3 seconds.
I will keep you posted once there is something that you can try out publicly. But good news is that the good news is coming :-)
Great news @Zlatkovsky ! :) Just curious, 1) does the fix account for having formulas on the table that will recalculate on write? 2) would appending to the table in batches will also work as fast?
@sameera, so that I can try it out with the two parameters that you mentioned: do you want to take my “slow” snippet and modify it to your scenario, and send it back to me? I can then run it on my end, and check. Thanks!
@Zlatkovsky Please see https://gist.github.com/sameera/9c71b79aa354a4724da4a897d2d254c3 You can simulate the batched-writing by clicking on the Issue or Workaround buttons repeatedly.
I didn't really see a performance drop due to formulas. But you'll see the appending being extremely slow.
@Zlatkovsky @tong-1324 - I'm just checking back in to see how this work is going?
Also, which host will gain from these perf enhancement? Excel Online, Windows, Mac, or All? The reason I ask is because a majority of our initial users will be leveraging Excel Online because unfortunately our IT org hasn't rolled out Windows 10/Office 2016 yet.
The improvement we are working on will be available on all platforms. You will likely get it for online sooner than other platforms as online has a faster release cadence.
Has the general performance been tested on other platforms such as Mac?
I've got a use case where I'm trying to generate between 200 and 1000 rows with about 65 columns. I am frequently getting errors or crashes or on a good run, it takes 2 minutes with one core pegged at 100% CPU.
I have tried modifying the code to convert the table to a range as shown above, removed formulas, removed styling, and also tried setting calculation to manual. (I can't try the application.suspendApiCalculationUntilNextSync parameter since it isn't available for Mac yet.)
I'm at my wits end trying to port over fairly simple and straightforward code from Google Sheets / App Scripts to Excel. :/ Any guidance would be appreciated.
I just tried it on a little ThinkPad, and the performance is significantly better than the Mac, even though Excel is only 32bit on Windows. :/
Also, it appears that the workaround above doesn't work at all if the table has structured references in it. Is this something that will continue to be a problem after the improvement you are releasing?
The general performance improvement is not targeting any specific platform. It doesn't block any specific feature such as structured references either. Although Mac and Win32 are sharing the same code, the builds between the two platforms are not always fully sync'd, which may explain why some Mac builds are slower than the Win32 builds. In the end, we expect the two platforms to be fairly close in terms of performance.
@weidezhong Have the improvement been tested against the ScriptLab Snippet I posted above. I'm curious to know the outcome :)
@sameera Yes, the improvement is slightly more effective than the workaround.
+1 Any updates?
Hi @misaunde, we have two major fix to improve performance on all platforms. The first is to improve the range.values API, which is availble starting from build 9021. The second one is to fix some object binding issue, and all the range and table related APIs should get benefit from it. This is available starting from build 9208. You just need to update your Excel to the latest build and you should be able to see those improvement.
In addition to the fix above, we are currently working on another API to let developers temporarlly suspend screen updating. Basically, it will work like the "SuspendApiCalculationUntilNextSync()". Both of these two could help you improve the overall performance. Also, we are going to publish a new document about how to make a better performance Excel Application very soon.
@misaunde @sameera hey guys, Juan PM for Excel.js here. what the latest on this issue? are you happy with the perf now?
@tong-1324 can you please share our latest and our document on perf?
i would like to get feedback if our perf improvements are awesome now.
@JuaneloJuanelo At the moment, we are unable to port our application to API 1.7 due to many of our customers being stuck on the MSI version of Excel. We plan to start the move in our next major release around mid Q4 . Sorry, I cannot give you any feedback at the moment.
@JuaneloJuanelo I just tested out my own gist sample with script lab. I see a significant improvement. I tested appending up to 5000+ rows (in chunks of 100) and the elapsed time was under 0.5s (it started at around 0.1s and kept growing very slowly). This is now much faster than the workarounds! Thank you!
@JuaneloJuanelo I definitely have noticed improvements of writing plain text to excel. Thank you. I just wrote 25000 rows x 3 columns in under .3 seconds. That's fantastic.
My only concern is that as soon as we add more than plain text (formatting, formulas, cell coloring, conditional formatting, columnWidth adjustments, etc), the performance gains seem to disappear.
For example, I wrote the same 25000 rows (x 1 column) with range.getCell(i, 0).format.fill.color = 'yellow'
and it took 31.223 seconds, even using suspendApiCalculationUntilNextSync()
.
This remains our biggest challenge using Office.js as almost none of our tools write just plain text. In fact, we're in the process of switching our largest tool over to the server now due to this limitation. I would love to hear more about temporarily suspending the screen updating when writing and a timeline for that feature.
If it's any help to anyone else, one thing we discovered that was crucial for performance was setting range.numberFormat
before assigning anything to range.values
:
https://stackoverflow.com/a/49616210
@misaunde , for the formatting case, could you please open up a separate question on https://stackoverflow.com/questions/tagged/office-js and link to it here? There is a workaround I can suggest that will likely do a lot of good for you -- but I want to make sure that other folks can also see it without it being lost as the 90th comment to an older issue. :-)
@Zlatkovsky - Great news, here you go: https://stackoverflow.com/q/51735674/3806701
Summary
I'm creating a table in Excel using the new Promise based API's and inserting data into it and the performance isn't what I would have expected.
Expected Behavior
I would expect that 2000 records would be fast and only take 1 second or so and that I would be able to insert significantly more data. We are in the early phases of developing our add-in so we are using small datasets but in the future we want to support 100's of thousands of records. I recognize to insert that many records we will need to probably chunk it up and insert in batches but I would think those batches could be bigger than 2000 records???
Current Behavior
When I create a table and insert 100 records it is snappy and feels fast. Less than 1 second. However, when I load 2,000 records and 20 columns it is much slower. At worst I have seen 20 seconds and at best like 5 seconds.
Steps to Reproduce, or Live Example
See code snippet above. It is basically the same as what is provided in ScripLab but the data is just 2000 records x 20 columns
Context
As mentioned above, I would expect 2000 records x 20 columns to be faster than it is because I want to be able to allow users to download large datasets (100,000 records+). Assuming, we can provide a good user experience.
Your Environment
Useful logs
I'm adding some Application Insight events right now and will update issue with some performance observations.
Questions: