PHPOffice / PHPExcel

ARCHIVED
Other
11.46k stars 4.2k forks source link

Total Rows not correct #1302

Open gamevnlc opened 6 years ago

gamevnlc commented 6 years ago

Hi, current i am using "Maatwebsite/Laravel-Excel" package for laravel framework . It use "phpoffice/phpexcel" package . I got the problem when try to import excel file. My excel file has only 10 rows, but I debug in and it show 320 row. Does anybody have any idea?

My excel file KzT1pYIonyKI99aTey7wSAPIVthqUM.xlsx

CJDennis commented 6 years ago

If you originally had 320 rows and deleted 310 of them, you'd think you only have 10 remaining, but Excel can reserve the space previously used. See if the scroll bar (click on the scroll bar and drag it down as far as it will go, don't use page up/page down or the mouse wheel) takes you to row 320 as the last row.

Your Excel file shows that you originally had 332 rows and deleted most of them. This is the way Excel works, and not a bug in PHPExcel.

gamevnlc commented 6 years ago

Oh thank you,In my case some cells have property like border or background so It seem that PHPExcel detects cell contain property background color/ border is not empty. Is it right

derekrprice commented 6 years ago

Per issue #874 (I am commenting here because that ticket is closed already), I understand what @MarkBaker is saying about backwards compatibility, but I have to agree with @PowerGamer1 (and a number of StackOverflow and other net users), that what getHighestDataRow actually does is practically useless. I don't care if I've applied formatting to the row, I want an efficient way to grab the last one that contains non-blank data without calling getCell 1000 x 26 times myself. I want the row that a user opening the spreadsheet and just visually looking at it would say is the last row.

Surely there is an efficient way to do this. At load time, you must already know if each cell is blank or not, so increment a MAXNONEMPTYROW entry appropriately there. Also easy to update when setting a cell's data. When clearing cell data, it's a little trickier. Perhaps set a RECHECKMAXNONEMPTYROW flag and lazily calculate a new value for MAXNONEMPTYROW when RECHECKMAXNONEMPTYROW is set and getHighestNonEmptyRow is called? And similar for getMaxNonEmptyColumn, of course.

CJDennis commented 6 years ago

@derekrprice The most efficient way is to ask Excel what the maximum row is. It has a time of O(1). Looping over each row has a time of O(n^2) because you have to check every cell in every row. If you know with 100% confidence that the file you're reading is compatible with the first way, you probably don't want to sacrifice the speed. However, if you could get the wrong result, you will want accuracy over speed.

As far as I'm aware, PHPExcel does not read every cell when you open the file (because that would add an unnecessary overhead), and instead relies on Excel's meta-data embedded within the file.

derekrprice commented 6 years ago

@CJDennis Actually, PHPExcel does read the entire file when opened, as I surmised, and we’ve managed to use that fact to write a patch which implements the algorithm I’ve described above to cache the maximum non-empty row. As far as asking PHPExcel for the maximum row using the old way goes, it may be O(1), but it counts empty cells which only have formatting applied, which, I believe, is the problem that this bug report and the other I linked originally attempted to describe.

Anyhow, my algorithm is O(1) most of the time (all the time in our use case) and only O(n^2) in the worst (and unavoidable case). Even then, unless the worksheet is totally empty, working backwards from the previous maximum means that it should use many fewer resources than that.

We’re still testing, but we should be able to submit a pull request next week sometime. Our initial results are that it speeds up our use case by a factor of 10 since we only need to parse cells in a much smaller section of each sheet when we can ignore cells with only formatting that was applied to empty ranges. Previously, profiling revealed that 60% of our app’s time was spent in getCell and 5/6 of getCell’s time was spent in the Cell constructor. Anyhow, not calling getCell to check whether a cell was empty saved a LOT of time.

On Fri, Sep 29, 2017 at 6:46 PM CJDennis notifications@github.com wrote:

@derekrprice https://github.com/derekrprice The most efficient way is to ask Excel what the maximum row is. It has a time of O(1). Looping over each row has a time of O(n^2) because you have to check every cell in every row. If you know with 100% confidence that the file you're reading is compatible with the first way, you probably don't want to sacrifice the speed. However, if you could get the wrong result, you will want accuracy over speed.

As far as I'm aware, PHPExcel does not read every cell when you open the file (because that would add an unnecessary overhead), and instead relies on Excel's meta-data embedded within the file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PHPOffice/PHPExcel/issues/1302#issuecomment-333256232, or mute the thread https://github.com/notifications/unsubscribe-auth/AFKbAK4r_zQOzTGY4gBAesNL9zwH4DeAks5snXM8gaJpZM4PG_Pp .

PowerKiKi commented 6 years ago

@derekrprice if you are going to submit a PR, please submit it against PhpSpreadsheet develop branch, not PHPExcel. Nothing will be merged for PHPExcel anymore. And since PhpSpreadsheet is not released yet, it might be our best chance to break compatibility, if you are able to explain the reason for the break and provide a patch that is covered by unit tests.