SpartnerNL / Laravel-Excel

🚀 Supercharged Excel exports and imports in Laravel
https://laravel-excel.com
MIT License
12.24k stars 1.91k forks source link

[Bug]: Memory issues on eloquent - 2600 results (queues exports) #3972

Closed jackkitley closed 11 months ago

jackkitley commented 1 year ago

Is the bug applicable and reproducable to the latest version of the package and hasn't it been reported before?

What version of Laravel Excel are you using?

3.1.48

What version of Laravel are you using?

10.16.1

What version of PHP are you using?

8.2.6

Describe your issue

I create a eloquent query with 4 relationships. I return the data as a collection and pass this through to laravel export construct to be processed. I have switched to queuing the data and chunking the export but the export seems to fail on receiving the data to process.

Here is the start of my export:

        $stock = $this->stockRepository->export($filters);

        $uniqueFile = 'stock' . uniqid() . '.xlsx';
        (new StockExport($stock))->queue("exports/$uniqueFile", 's3')->chain([
            new NotifyUserOfCompleteExport(request()->user(), $uniqueFile),
        ]);

        return $this->backWithSuccessFlash(__('Export started'));

Here is the config:


use Maatwebsite\Excel\Excel;

return [
    'exports' => [

        /*
        |--------------------------------------------------------------------------
        | Chunk size
        |--------------------------------------------------------------------------
        |
        | When using FromQuery, the query is automatically chunked.
        | Here you can specify how big the chunk should be.
        |
        */
        'chunk_size'             => 100,

        /*
        |--------------------------------------------------------------------------
        | Pre-calculate formulas during export
        |--------------------------------------------------------------------------
        */
        'pre_calculate_formulas' => false,

        /*
        |--------------------------------------------------------------------------
        | Enable strict null comparison
        |--------------------------------------------------------------------------
        |
        | When enabling strict null comparison empty cells ('') will
        | be added to the sheet.
        */
        'strict_null_comparison' => false,

        /*
        |--------------------------------------------------------------------------
        | CSV Settings
        |--------------------------------------------------------------------------
        |
        | Configure e.g. delimiter, enclosure and line ending for CSV exports.
        |
        */
        'csv'                    => [
            'delimiter'              => ',',
            'enclosure'              => '"',
            'line_ending'            => PHP_EOL,
            'use_bom'                => false,
            'include_separator_line' => false,
            'excel_compatibility'    => false,
            'output_encoding'        => '',
        ],

        /*
        |--------------------------------------------------------------------------
        | Worksheet properties
        |--------------------------------------------------------------------------
        |
        | Configure e.g. default title, creator, subject,...
        |
        */
        'properties'             => [
            'creator'        => '',
            'lastModifiedBy' => '',
            'title'          => '',
            'description'    => '',
            'subject'        => '',
            'keywords'       => '',
            'category'       => '',
            'manager'        => '',
            'company'        => '',
        ],
    ],

    'imports'            => [

        /*
        |--------------------------------------------------------------------------
        | Read Only
        |--------------------------------------------------------------------------
        |
        | When dealing with imports, you might only be interested in the
        | data that the sheet exists. By default we ignore all styles,
        | however if you want to do some logic based on style data
        | you can enable it by setting read_only to false.
        |
        */
        'read_only'    => true,

        /*
        |--------------------------------------------------------------------------
        | Ignore Empty
        |--------------------------------------------------------------------------
        |
        | When dealing with imports, you might be interested in ignoring
        | rows that have null values or empty strings. By default rows
        | containing empty strings or empty values are not ignored but can be
        | ignored by enabling the setting ignore_empty to true.
        |
        */
        'ignore_empty' => false,

        /*
        |--------------------------------------------------------------------------
        | Heading Row Formatter
        |--------------------------------------------------------------------------
        |
        | Configure the heading row formatter.
        | Available options: none|slug|custom
        |
        */
        'heading_row'  => [
            'formatter' => 'slug',
        ],

        /*
        |--------------------------------------------------------------------------
        | CSV Settings
        |--------------------------------------------------------------------------
        |
        | Configure e.g. delimiter, enclosure and line ending for CSV imports.
        |
        */
        'csv'          => [
            'delimiter'        => null,
            'enclosure'        => '"',
            'escape_character' => '\\',
            'contiguous'       => false,
            'input_encoding'   => 'UTF-8',
        ],

        /*
        |--------------------------------------------------------------------------
        | Worksheet properties
        |--------------------------------------------------------------------------
        |
        | Configure e.g. default title, creator, subject,...
        |
        */
        'properties'   => [
            'creator'        => '',
            'lastModifiedBy' => '',
            'title'          => '',
            'description'    => '',
            'subject'        => '',
            'keywords'       => '',
            'category'       => '',
            'manager'        => '',
            'company'        => '',
        ],

    ],

    /*
    |--------------------------------------------------------------------------
    | Extension detector
    |--------------------------------------------------------------------------
    |
    | Configure here which writer/reader type should be used when the package
    | needs to guess the correct type based on the extension alone.
    |
    */
    'extension_detector' => [
        'xlsx'     => Excel::XLSX,
        'xlsm'     => Excel::XLSX,
        'xltx'     => Excel::XLSX,
        'xltm'     => Excel::XLSX,
        'xls'      => Excel::XLS,
        'xlt'      => Excel::XLS,
        'ods'      => Excel::ODS,
        'ots'      => Excel::ODS,
        'slk'      => Excel::SLK,
        'xml'      => Excel::XML,
        'gnumeric' => Excel::GNUMERIC,
        'htm'      => Excel::HTML,
        'html'     => Excel::HTML,
        'csv'      => Excel::CSV,
        'tsv'      => Excel::TSV,

        /*
        |--------------------------------------------------------------------------
        | PDF Extension
        |--------------------------------------------------------------------------
        |
        | Configure here which Pdf driver should be used by default.
        | Available options: Excel::MPDF | Excel::TCPDF | Excel::DOMPDF
        |
        */
        'pdf'      => Excel::MPDF,
    ],

    /*
    |--------------------------------------------------------------------------
    | Value Binder
    |--------------------------------------------------------------------------
    |
    | PhpSpreadsheet offers a way to hook into the process of a value being
    | written to a cell. In there some assumptions are made on how the
    | value should be formatted. If you want to change those defaults,
    | you can implement your own default value binder.
    |
    | Possible value binders:
    |
    | [x] Maatwebsite\Excel\DefaultValueBinder::class
    | [x] PhpOffice\PhpSpreadsheet\Cell\StringValueBinder::class
    | [x] PhpOffice\PhpSpreadsheet\Cell\AdvancedValueBinder::class
    |
    */
    'value_binder'       => [
        'default' => Maatwebsite\Excel\DefaultValueBinder::class,
    ],

    'cache'        => [
        /*
        |--------------------------------------------------------------------------
        | Default cell caching driver
        |--------------------------------------------------------------------------
        |
        | By default PhpSpreadsheet keeps all cell values in memory, however when
        | dealing with large files, this might result into memory issues. If you
        | want to mitigate that, you can configure a cell caching driver here.
        | When using the illuminate driver, it will store each value in a the
        | cache store. This can slow down the process, because it needs to
        | store each value. You can use the "batch" store if you want to
        | only persist to the store when the memory limit is reached.
        |
        | Drivers: memory|illuminate|batch
        |
        */
        'driver'     => 'batch',

        /*
        |--------------------------------------------------------------------------
        | Batch memory caching
        |--------------------------------------------------------------------------
        |
        | When dealing with the "batch" caching driver, it will only
        | persist to the store when the memory limit is reached.
        | Here you can tweak the memory limit to your liking.
        |
        */
        'batch'      => [
            'memory_limit' => 60000,
        ],

        /*
        |--------------------------------------------------------------------------
        | Illuminate cache
        |--------------------------------------------------------------------------
        |
        | When using the "illuminate" caching driver, it will automatically use
        | your default cache store. However if you prefer to have the cell
        | cache on a separate store, you can configure the store name here.
        | You can use any store defined in your cache config. When leaving
        | at "null" it will use the default store.
        |
        */
        'illuminate' => [
            'store' => null,
        ],
    ],

    /*
    |--------------------------------------------------------------------------
    | Transaction Handler
    |--------------------------------------------------------------------------
    |
    | By default the import is wrapped in a transaction. This is useful
    | for when an import may fail and you want to retry it. With the
    | transactions, the previous import gets rolled-back.
    |
    | You can disable the transaction handler by setting this to null.
    | Or you can choose a custom made transaction handler here.
    |
    | Supported handlers: null|db
    |
    */
    'transactions' => [
        'handler' => 'db',
        'db'      => [
            'connection' => null,
        ],
    ],

    'temporary_files' => [

        /*
        |--------------------------------------------------------------------------
        | Local Temporary Path
        |--------------------------------------------------------------------------
        |
        | When exporting and importing files, we use a temporary file, before
        | storing reading or downloading. Here you can customize that path.
        |
        */
        // 'local_path'          => storage_path('framework/cache/laravel-excel'),
        'local_path'          => public_path('imported'),

        /*
        |--------------------------------------------------------------------------
        | Remote Temporary Disk
        |--------------------------------------------------------------------------
        |
        | When dealing with a multi server setup with queues in which you
        | cannot rely on having a shared local temporary path, you might
        | want to store the temporary file on a shared disk. During the
        | queue executing, we'll retrieve the temporary file from that
        | location instead. When left to null, it will always use
        | the local path. This setting only has effect when using
        | in conjunction with queued imports and exports.
        |
        */
        'remote_disk'         => 's3',
        'remote_prefix'       => 'temp_exports/',

        /*
        |--------------------------------------------------------------------------
        | Force Resync
        |--------------------------------------------------------------------------
        |
        | When dealing with a multi server setup as above, it's possible
        | for the clean up that occurs after entire queue has been run to only
        | cleanup the server that the last AfterImportJob runs on. The rest of the server
        | would still have the local temporary file stored on it. In this case your
        | local storage limits can be exceeded and future imports won't be processed.
        | To mitigate this you can set this config value to be true, so that after every
        | queued chunk is processed the local temporary file is deleted on the server that
        | processed it.
        |
        */
        'force_resync_remote' => true,
    ],
];

Here is my Export:


namespace App\Exports;

use App\Enums\Listing\Status;
use Illuminate\Contracts\Support\Responsable;
use Illuminate\Database\Eloquent\Collection;
use Maatwebsite\Excel\Concerns\Exportable;
use Maatwebsite\Excel\Concerns\FromCollection;
use Maatwebsite\Excel\Concerns\WithHeadings;
use Maatwebsite\Excel\Concerns\WithMapping;
use Maatwebsite\Excel\Excel;

class StockExport implements WithHeadings, FromCollection, Responsable, WithMapping
{
    use Exportable;

    /**
     * It's required to define the fileName within
     * the export class when making use of Responsable.
     */
    private $fileName = 'stock.xlsx';

    /**
     * Optional Writer Type
     */
    private $writerType = Excel::XLSX;

    /**
     * Optional headers
     */
    private $headers = [
        'Content-Type' => 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    ];

    /**
     * Stock data from repository
     *
     * @param Collection $stockData
     */
    public function __construct(private Collection $stockData)
    {
    }

    public function headings(): array
    {
        return [
            __('Stock ID'),
            __('Stock Number'),
            __('Vehicle Description'),
            __('Department'),
            __('Colour'),
            __('Year'),
            __('Mileage'),
            __('Listing Price (Incl. VAT)'),
            __('Dealership'),
            __('Days in Stock'),
            __('Status'),
            __('Active Listing'),
            __('No. of Images'),
        ];

    }

    public function collection(): Collection
    {
        return $this->stockData;
    }

    public function map($row): array
    {
        return [
            $row->id,
            $row->stock_number,
            $row->vehicle->vehicle_description,
            __($row->department->value),
            $row->vehicle->colour,
            $row->vehicle->year,
            $row->mileage,
            $row->listing?->price,
            $row->dealership->company->trading_name,
            $row->stock_age,
            __($row->status),
            ($row->listing?->status === Status::ACTIVE->value) ? __('Yes') : __('No'),
            $row->listing?->listingImages?->count(),
        ];
    }
}

How can the issue be reproduced?

Seems it can be produced with not alot of data. I have 256M of memory in php.ini. I dont think i should be going more than that.

Im not sure what else to do. I cant set infinite memory.

My goal would be to export over 500k records for example...

What should be the expected behaviour?

I shouldnt get memory issues and the data should be processed and exported fine.

patrickbrouwers commented 1 year ago

For starters, passing the entire collection through the constructor defeats the purpose of using the queue feature. It will already have loaded everything into memory and the payload of the queue job will be enormous.

It's better to do the query within the export object.

I would further advise against using the built-in queue feature, it's better to wrap the export inside of an normal Laravel job and queue it on a long running queue. (Longer timeout)

jackkitley commented 1 year ago

For starters, passing the entire collection through the constructor defeats the purpose of using the queue feature. It will already have loaded everything into memory and the payload of the queue job will be enormous.

It's better to do the query within the export object.

I would further advise against using the built-in queue feature, it's better to wrap the export inside of an normal Laravel job and queue it on a long running queue. (Longer timeout)

The ->query() worked thanks. i was able to run the export. It just took alot of time but i think its because it needed to interact with s3 and im on my local.

I further went to remove shouldQueue and put the export process into a Job. This is what it looks like

Controller:


        QueueStockExport::dispatch($this->stockRepository, $filters, request()->user())->onQueue('exports');

        return $this->backWithSuccessFlash(__('Export started'));

Job:

class QueueStockExport implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    /**
     * Create a new job instance.
     */
    public function __construct(private readonly StockRepository $stockRepository, private array $filters, private User $user)
    {
        //
    }

    /**
     * Execute the job.
     */
    public function handle(): void
    {
        $uniqueFile = 'stock' . uniqid() . '.xlsx';
        $export = new StockExport($this->stockRepository);
        $export->setFilters($this->filters);
        $complete = $export->store("exports/$uniqueFile", 's3');
        if($complete) {
            NotifyUserOfCompleteExport::dispatch($this->user, $uniqueFile);
        }
    }
}

Now i dont get an AppendToSheet job that executes. I presume its doing the chunking in the background? Its just running my job in my queue container called QueueStockExport.

Let me know if this is the correct way please.

Thanks

jackkitley commented 1 year ago

@patrickbrouwers Would avoiding the default queue and using a laravel queue still chunk the file? I dont see any chunked files being added to S3 anymore.

Im running multiple containers so would prefer it to be on S3.

patrickbrouwers commented 1 year ago

It chunks the query to keep memory low. It doesn't chunk the writing to the file, which is a feature that unfortunately doesn't work because of how phpspreadsheet works with re-loading the entire sheet back into memory.

jackkitley commented 1 year ago

It chunks the query to keep memory low. It doesn't chunk the writing to the file, which is a feature that unfortunately doesn't work because of how phpspreadsheet works with re-loading the entire sheet back into memory.

ok, thank you

stale[bot] commented 11 months ago

This bug report has been automatically closed because it has not had recent activity. If this is still an active bug, please comment to reopen. Thank you for your contributions.