SpartnerNL / Laravel-Excel

🚀 Supercharged Excel exports and imports in Laravel
https://laravel-excel.com
MIT License
12.29k stars 1.92k forks source link

Why do queued exports take a lot of time? #2742

Closed okmkey45 closed 3 years ago

okmkey45 commented 4 years ago

Prerequisites

Versions

Description

I have been using this package over a year now. Recently I started to have problems with memory and performance. Users are trying to download a lot of data; some times historic data. So I read the docs and I found the queued section. At first, it was the perfect solution. With queued exports, multiple jobs are dispatched in order to generate the file by chunks. But I noticed something. Each job (AppendQueryToSheet) that was dispatched took more time to finish than the last one. And the final job (CloseSheet) took even more time to finish.

My question is: ¿is this the way that queued export works? ¿or am I missing something?

Additional Information

I am using Redis and Horizon. Even I tried with multiple workers but it seems that everytime a job finishes, a new job is dispatched. So, it did not solve my problem.

I attached this image so you can see times registered.

Selection_007

patrickbrouwers commented 4 years ago

Every query has to reopen the spreadsheet file which takes PhpSpreadsheet some time, so indeed gets longer every append.

I've got some changes planned for 3.2 that will improve this.

aakarim commented 4 years ago

I'm experiencing the same issue here. The file sizes aren't massive either, only 5-10mb. I'm not experiencing any spikes in CPU or Memory usage.

okmkey45 commented 4 years ago

I decided to build a service with python and pandas and it improved a lot, and I mean A LOT. Laravel Excel is an excellent library but I guess for this job maybe it's not the right tool

aakarim commented 4 years ago

@okmkey45 To be fair, it's not really Laravel Excel's fault, it relies on PHPSpreadsheet which is an inefficient library for reading/writing files. It's a shame because the abstractions around chunking queries and queueing jobs are really handy here, and save me a lot of time writing a service in Go/Python and hooking it up with my queue.

There are quite a few alternatives to PHPSpreadsheet which are much faster on large files, maybe they could be swapped in when we use ->queue or ShouldQueue?

patrickbrouwers commented 4 years ago

@aakarim that's actually something I've consider, but currently don't have the time or use-case to implement this. I would be open for a PR as long as it would be kept simple and opt-in.

aakarim commented 4 years ago

I'll have a bash after this deadline. Are there any packages you prefer/recommend @patrickbrouwers? https://github.com/box/spout seems like a good place to start.

patrickbrouwers commented 4 years ago

@aakarim yes box/spout. I have even consider to make it possible to use league/csv if you just want csv, that's by far the fastest option for large bulk imports/exports.

stale[bot] commented 3 years ago

This bug report has been automatically closed because it has not had recent activity. If this is still an active bug, please comment to reopen. Thank you for your contributions.

vitorccs commented 2 years ago

@aakarim I created a Laravel project https://github.com/vitorccs/laravel-csv/ inspired on Laravel-Excel and implementing some cool stuffs like FromQuery, WithMapping, WithHeadings, etc.

E.g: the export of a 50k records CSV file dropped from about 45min to 10min

Hope this can be useful for anyone reading this issue report.

aakarim commented 2 years ago

That looks good @vitorccs but unfortunately my use case is mostly XLSX.

In the end I switched to a custom Go system to process my Excel exports for anything with over 2,500 rows. It was less work overall than modifying this library.