collectiveaccess / providence

Cataloguing and data/media management application
GNU General Public License v3.0
290 stars 167 forks source link

Generation of xlsx and docx files creating corrupt files #1468

Closed Monica-Wood closed 10 months ago

Monica-Wood commented 1 year ago

Using dev/php8 and ubuntu 22.04 with default libreoffice package.

It's my assumption that the issue is with the newer libreoffice package on Ubuntu 22.04 rather than the php8 branch.

Will update when I gather more information.

collectiveaccess commented 1 year ago

There's a fix for this in the dev/parameterized-reports branch. This branch tracks dev/php8 but includes a substantial rewrite of the XLSX/DOCX/PDF export system, including:

• Support for specification of parameters for an export at download time. • Background processing of large reports. • Some streamlining of how reports are written. • Some reduction in redundant code - exports go through a more limited set of functions. • Various bug fixes, including for the corruption issue you've noted above.

I can move the corruption fixes into dev/php8 without merging the rest of these changes, but I'm looking to merge dev/parameterized-reports into dev/php8 very soon – I already have a few intensive users working with it day-to-day, so far successfully, so I am not sure it's worth the effort.

Monica-Wood commented 11 months ago

Hi Seth,

I have noticed that this branch has been merged into the php8 branch now and I am finding that I'm still having the issue with the corrupt files, except on one server, which is my development server using a newer version of PHP (8.1.22 and 8.2) where the others are at 8.1.12/13. Everything else I believe is set up the same.

Is there something in particular I might be missing here?

Thanks, Monica

collectiveaccess commented 11 months ago

With what versions of PHP do you have the problem?

Monica-Wood commented 11 months ago

The ones with the problem are 8.1.12 and 8.1.13 and using the Ubuntu 22.04 default php packages

The one that is working was on 8.2.* and I downgraded it to the 8.1 version that was available which was 8.1.22 and that one worked too. That one is using the Ondrej packages.

Monica-Wood commented 10 months ago

Just discovered this is also happening to .zip files when downloading all media associated with objects from the object lot. So this might have something to do with the library generating the compression that xlsx and docx would also use?

collectiveaccess commented 10 months ago

Are you running this under nginx? If Apache, is it php-fpm or mod_php (Apache module)?

Monica-Wood commented 10 months ago

Using nginx, php-fpm for both. By comparing the output of the corrupt file with the output of the working file... It turns out that the corrupt files have a blank line inserted at the top of the file. If you open it in a text editor, remove this line and save it, it will then open correctly.

collectiveaccess commented 10 months ago

Tested with PHP 8.2.9 and 8.1.22 and see no problems.

Monica-Wood commented 10 months ago

Yes, this is the problem. We've got one server where it works fine and many others using different PHP versions where it's not. Tracking down what is different between them is proving hard. We will continue our picking it apart and report back when we work it out.

Can you please point me to the commit where you mentioned you fixed it?

Thanks for looking.

collectiveaccess commented 10 months ago

Are you using the same setup.php file on all of these systems? Maybe you have a newline outside of "PHP mode" in it.

There isn't a single commit that fixes the old issue. It was a general rewrite.

Monica-Wood commented 10 months ago

Hey, I was just coming back to say we have found the issue and you have hit the nail on the head. Part of the automation was creating the setup.php with our configuration and there was an end tag in there (?>) with a blank line after it.

Either removing the blank line or just removing the end tag fixed it all up.

It took me figuring out last night that it was an extra line being inserting, then this morning went hunting for what the ansible config was changing and found it.

Thank you once again for looking into it, even though it wasn't your problem to solve.