Open BharathIO opened 1 month ago
@BharathIO Thank you for contacting us. We will investigate.
One more point i observed.
Whenever we try to create new row / modify existing cell value CPU utilization is going peek into 99% and so on.
Ex: new_book.insert_new_row("Sheet1", &3, &1);
I appreciate if you can share what all the functions i can use to read/write using lazy.
Ex: let mut new_book = umya_spreadsheet::reader::xlsx::lazy_read(path1).unwrap();
Above lazy_read somehow did not eat memory/ more cpu
One more point i observed.
Whenever we try to create new row / modify existing cell value CPU utilization is going peek into 99% and so on.
Ex:
new_book.insert_new_row("Sheet1", &3, &1);
Is there a problem with CPU going 100%? In a compute-heavy scenario, you'd want CPU to max out in order for the result to be returned quickly... Now I don't think this repo is multi-threaded, so I assume you're talking maxing out one CPU core?
Now excessive usage of RAM is another matter. I must ask: have you compiled with release
? It makes a huge difference.
Also, your XLSX is not particularly small. It may contain only one row, but it is 818KB in size, and actually contains 17528 rows which makes it larger than many large spreadsheet models. I suspect there are lots of formatting in individual cells? Can't tell without analyzing the file in depth.
So you have an 800KB file which is close to 1MB. Parsing this into a Rust structure, it is not unreasonable to expect multiple times the size of data, especially when every record is small and there are a LOT of them. I'd guess 1GB is a bit too much, but I definitely would expect at least a few hundred MB in RAM consumed just by reading the file into memory.
For example, an entry like <foo />
is only a few bytes in the Excel file, but the foo
structure may contain hundreds of fields, all of which are default for this entry. The actual type in Rust would be HUGE.
Removing the extra blank rows shrinks the file down to 14KB.
Thank you for your update and detailed explanation.
I am currently facing an issue where CPU utilization spikes to 100%. According to the configuration, when CPU usage exceeds a threshold of 60%, the process is shut down, and an email alert is sent to the admin.
I am using the cargo build --release option for compilation. While the file size itself isn't a concern, the CPU and RAM usage during read/write operations is much higher than expected.
I will attach a sample code with CPU and memory statistics for your reference.
I understand that larger files can naturally consume more resources, but in this case, the files are relatively small. However, even with the code below, which only removes a row at index 2, CPU consumption reaches 99%.
Could you kindly provide suggestions or steps to optimize resource usage and reduce the CPU/memory footprint during file read/write operations? I am not concerned about increasing the time it takes for these operations if that helps in reducing resource consumption.
let template_path = "../lov-template-new.xlsx";
let output_path = ../1_style_issue_output.xlsx";
let mut book = match umya_spreadsheet::reader::xlsx::read(template_path) {
Ok(book) => book,
Err(e) => {
println!("Error: {:?}", e);
return;
}
};
if let Some(sheet) = book.get_sheet_mut(&1) {
let row_index = 2;
sheet.remove_row(&row_index, &1);
println!("Removed row: {}", row_index);
}
match umya_spreadsheet::writer::xlsx::write(&book, output_path) {
Ok(_) => println!("Success writing to file: {}", output_path),
Err(e) => println!("Error: {:?}", e),
};
Must you keep all those blank rows?
Or can you use column styles instead?
I would suggest reducing the size of the xlsx files. Excel files are not the easiest to parse and this repo does a very thorough job in exposing very detailed structure. You have to give it some slack by helping it with less work.
I appreciate the power of this repo, it has all the features i wanted so far.
Can you please help me how to reduce the size of xlsx without losing any data and styles?
also, i saw that lazy_read and write_light functions are very helpful. An you suggest such functions so that i can use for my functionality?
I would start by finding out why it is so large in the first place. You most likely duplicated formatting styles over all 17000+ rows.
Maybe you should just make some of the column styles and they'll be in effect for all cells in that column.
EDIT: Strange, your XLSX file's data is 6MB (!!!) uncompressed. However, I don't see any specific formatting in any of those cells.
Try this one instead.
Thanks, i will try it and let you know with my results.
May i know how you compressed it without losing styles and formattings etc., As per my understanding, new file your shared has below. Highest column - 13, Highest row - 2
All the blank rows are gone now. Please share the steps to do this for any other files in future.
All the blank rows are gone now. Please share the steps to do this for any other files in future.
Simple. Select all the blank columns up to the right end, delete columns. Select first blank column, press Shift-Ctrl-Right-Arrow to select all the columns. Then Edit-Delete.
Select all the blank rows down to the lowest bottom, deleted rows. Select first blank row, press Shift-Ctrl-Down-Arrow to select all rows. Then Edit-Delete.
Thanks for the update. I am aware of manual steps to remove blank rows. Is there any way programmatically remove them?
Thanks for the update. I am aware of manual steps to remove blank rows. Is there any way programmatically remove them?
There is API to delete rows and columns...
Thanks for the update. I am aware of manual steps to remove blank rows. Is there any way programmatically remove them?
There is API to delete rows and columns...
Is it sheet.remove_row() you are talking about?
I mean, i need a way to remove/delete all empty rows for given input file programmatically with rust.
I mean, i need a way to remove/delete all empty rows for given input file programmatically with rust.
I believe you can loop through the Excel sheet data structures and simply delete all cell-specific data if the cell is empty. However that'll require iterating over a huge number of cells. I would not recommend it.
You are not doing simple manipulation here. You're rewriting Excel.
Alternatively, if all your rows are guaranteed to contain all data or all blanks, then just search downwards until you find a blank row, then remove rows up to max in one single call.
Thanks for the info. I am trying couple of ways and got few questions.
Can you please explain the performance differences between
use umya_spreadsheet::{reader::xlsx::lazy_read, writer::xlsx::write_light};
@BharathIO
read
reads a file and then deserializes it.
lazy_read
only reads the file.
lazy_read
does not deserialize, so it is limited in what it can do, but is faster.
The difference between write
and write_light
is in the compression format.
write_light
is faster than write
, but increases the file size of the generated files.
I mean, i need a way to remove/delete all empty rows for given input file programmatically with rust.
I believe you can loop through the Excel sheet data structures and simply delete all cell-specific data if the cell is empty. However that'll require iterating over a huge number of cells. I would not recommend it.
You are not doing simple manipulation here. You're rewriting Excel.
Alternatively, if all your rows are guaranteed to contain all data or all blanks, then just search downwards until you find a blank row, then remove rows up to max in one single call.
We have prepared a function cleanup() that handles this process. However, there is still some difficulty in processing speed.
Great. I will definitely use cleanup for my use case. May i know how long it took to cleanup all empty rows (around 17K) in the file which i shared?
@BharathIO
Try this one instead.
We performed a cleanup on this file. It was completed in an instant.
I would like everybody to try out my PR on: https://github.com/MathNya/umya-spreadsheet/pull/242
Let me know if memory requirements drop.
I would like everybody to try out my PR on: #242
Let me know if memory requirements drop.
Great, i could see a bigger change now in terms of Memory & CPU utilization. I will validate few more use cases and post my observations here.
Hi, while reading and writing excel file, CPU utilization is going peek to more than 100% and RAM memory consumption is going beyond 1 GB. Can you please check and do the needful.
Attaching Excel template.
Steps to reproduce:
umya_spreadsheet::reader::xlsx::lazy_read
Vendor_Details.xlsx