catamphetamine / read-excel-file

Read *.xlsx files in a browser or Node.js. Parse to JSON with a strict schema.
https://catamphetamine.gitlab.io/read-excel-file/
MIT License
301 stars 52 forks source link

(Performance) Limit the number of rows being read for huge datasets #69

Closed freinet12 closed 4 years ago

freinet12 commented 4 years ago

How can I set a limit to the number rows to read? like set a max number of rows because it reads the entire file, and if the file is huge, it will lead to memory issues.

catamphetamine commented 4 years ago

Yes, perhaps I should even add it to the readme as the first paragraph: FOR SMALL FILES ONLY. There have been numerous complaints about the parsing time. https://github.com/catamphetamine/read-excel-file/issues/66 What number of rows is comfortable in your case? At what number of rows do you consider a file "huge" and how much seconds long is the freezing?

freinet12 commented 4 years ago

I only need to read the first 10 rows in order to generate a file preview in the UI in my case. So I have no control over the files that the clients are uploading, which means they can be uploading files with 100k+ rows of data. So setting a limit would avoid potential memory issues

catamphetamine commented 4 years ago

Since it's a DOM parser, not SAX parser, it parses the whole XML file. A SAX parser would most likely fix the RAM overflow. How much RAM does it use in you case? For file with how many rows and columns?