aVadim483 / fast-excel-laravel

Lightweight and very fast XLSX Excel Spreadsheet Export/Import for Laravel
MIT License
16 stars 5 forks source link

Invalid xlsx generated because write buffer gets destroyed midway through #3

Closed lrakauskas closed 7 months ago

lrakauskas commented 11 months ago

When generating Excel files synchronously everything is working perfect, but if you generate Excel files within a queue worker, it's very likely that some of the files will turn out be invalid.

Issue steams from vendor/avadim/fast-excel-writer/src/FastExcelWriter/Writer.php, when running in a queue, __destruct() is not called straight away after job is finished, it could be called later on. In fact, this usually will end up called in a middle of a subsequent exports, which would destroy all buffers, including the ones that are being actively used, because buffers are stored in static variable.

Repro:

  1. Add queue job that would generate Excel file (use any example from documentation)
  2. Generate pretty large Excel files. The larger the file, the easier it is to replicate this. I personally would have it create 1MB excel file. For the sake of simplicity, don't use random data, hardcode some string arrays to be written. It will be easier to see by the generated xlsx file size which file is invalid, without needing to open it.
  3. Somewhere in your code, queue up few of these jobs. The smaller the files, the more jobs you'll need to queue.
  4. See all files that have been generated. You should see that some files are smaller than the other ones even though they should all be identical. First file, will always be correct size, rest is a matter of "luck". If you try to open one of the smaller files, MS Excel would give you an error that some XML tags are not closed.

To illustrate, if we were to add the following logging:

// vendor/avadim/fast-excel-writer/src/FastExcelWriter/Sheet.php::2599
    public function writeDataBegin($writer)
    {
        Log::info("Begin");
        ...
    }
    public function writeDataEnd()
    {
        Log::info("End");
        ...
    }

// vendor/avadim/fast-excel-writer/src/FastExcelWriter/Writer.php
    public function __destruct()
    {
        foreach (self::$buffers as $name => $buffer) {
            if ($buffer) {
                Log::info("Destorying buffer: " . $name);
                $buffer->close();
                self::$buffers[$name] = null;
            }
        }
        ...
     }

And we were to run export twice, we would get the following log:

[2023-12-04 12:14:55] local.INFO: Begin  
[2023-12-04 12:14:57] local.INFO: End  
[2023-12-04 12:15:03] local.INFO: Begin  
[2023-12-04 12:15:03] local.INFO: Destorying buffer: /var/www/storage/app/tmp/fast-excel/xlsx_writer_ptcJpi  
[2023-12-04 12:15:03] local.INFO: Destorying buffer: /var/www/storage/app/tmp/fast-excel/xlsx_writer_FSRWai  
[2023-12-04 12:15:03] local.INFO: Destorying buffer: /var/www/storage/app/tmp/fast-excel/xlsx_writer_cm4QRl  
[2023-12-04 12:15:05] local.INFO: End 

You can see that destruct has been called in a middle of a second export, which destroyed all write buffers and second export obviusly ended up missing some of the data and closing XML tags.

Quick solution is to destroy only the buffers that belonged to the object we are destructing. Something along the lines of:

 public function __destruct()
    {
        foreach (self::$buffers as $name => $buffer) {
            if($this->tempDir && !str_starts_with($name, $this->tempDir)){
                continue;
            }
            if ($buffer) {
                $buffer->close();
                self::$buffers[$name] = null;
            }
        }
        ...
    }

Note that for this to work, temp directory should be unique for each Excel file being generated, so either when creating Excel with Excel::create() you would want to pass temp_dir option like Excel::create(options: ['temp_dir' => $tempDir]);, or in vendor/avadim/fast-excel-laravel/src/FastExcelLaravel/ExcelWriter.php create() you would want to make a unique subfolder.

In general, I don't think it's a good practice to store buffers in static variables, but I would leave that for you to decide.

P.S. sorry for a limited example, don't have much time to create extensive Issue.

aVadim483 commented 10 months ago

I couldn't reproduce the problem, but you are right, storing buffers in a static variable is a bad idea. There was a reason for this, but now I moved them to a protected variable

aVadim483 commented 8 months ago

I moved the buffers from static to protected variables. Hope the problem is resolved. Thank you very much for your work and your analysis