JayXon / Leanify

lightweight lossless file minifier/optimizer
MIT License
832 stars 75 forks source link

Leanify removes spaces it shouldn't in .pptx #49

Closed jul059 closed 5 years ago

jul059 commented 5 years ago

Leanify removes spaces it shouldn't in the attached .pptx file. The title looks like this in the original:

image

And gets transformed to this with leanify on default settings:

image

This may be the result of poor formatting in the original document, but it is still a valid .pptx that was created by someone who was not specifically trying to find bugs in leanify.

MMD1019DiaposOligoéléments 2019 (2) - Copie - Copie.pptx

JayXon commented 5 years ago

Thanks for the report, this is an interesting one.

The issue is caused by a few <a:t> </a:t> in slide1.xml being minimized to <a:t/>, this is working as intended from xml perspective because xml:space="preserved" was not specified, so it should be safe to strip whitespaces. If PowerPoint was relying on the spaces for formating, then xml:space="preserved" should be added to the xml.

What's weird is that Word documents actually have xml:space="preserved" in their xml, so Leanify can detect it and not strip space in that case (details at #3), maybe Microsoft forgot to do this for PowerPoint? Does anyone know how to file a bug report for PowerPoint?

@jul059 Which PowerPoint version was used to create this document?

I probably can implement some workaround for this in Leanify, but I think this is actually PowerPoint's fault.

jul059 commented 5 years ago

@JayXon the original file was handed to me by a teacher, so I have no idea where it came from. It might have been originally a .ppt file that was later converted to .pptx, or perhaps it was a keynote file that was converted to .pptx since I'm studying in an environment where Apple products are the norm. I have no way of knowing unless there is some hidden information inside the original file that I can look for. Please tell me if there is.

I have used the latest PowerPoint for Windows (Version 1808 build 10730.20334 or 16.0.10730.20334) to create this new, single slide file by deleting every other slides.

You're right that it is probably PowerPoint's fault (or the converter's fault if it was originally in a different format). But since it is properly displayed by PowerPoint, it would seem that it may somehow "know" about it by displaying the spaces. It might therefore be an issue of PowerPoint's xml files not being strict xml files even though there is no mention of this anywhere. We've seen this with the old Internet Explorers where you would have to include a bunch of IE specific hacks to make sure the web page was displayed properly.

In any case, I think a workaround should be implemented even if a bug is fixed in PowerPoint since these faulty files already exist and are functional.