ASML-Labs / PPTX.jl

Generate PowerPoint PPTX files from Julia
https://asml-labs.github.io/PPTX.jl/
MIT License
90 stars 7 forks source link

ERROR: PowerPoint couldn't read some content in {PowerPoint File} - Repaired and removed it. #63

Closed ysaereve closed 6 hours ago

ysaereve commented 3 days ago

The error

The following error happened when PowerPoint open a generated PowerPoint file: PowerPoint couldn't read some content in {PowerPoint File} - Repaired and removed it.

After PowerPoint repaired it, the TextBox with link does not work as expected. i.e. link to wrong slide.

The following is the command and program that I used to do the test. ./MD2PPTX.jl Demo.md Demo.pptx > Demo.log

I have tried to debug this error last and this weekend. However, after performed the following, I suspect it is either the write function or PowerPoint repair cause this problem.

  1. Inspected the dump of the Presentation and Slide objects - links are created as expected.
  2. Compared Slide object when assigning the link - the current slide is the same as in Slide map and Slide array.

Could you help to provide some direction for troubleshooting this error? I did try to inspect the PPTX file (i.e. zip file) by unzipping it, but I didn't find any clue ...

If there is only one topic in "Topics" Slide in Demo.md, it works fine. When there are more than one topic in "Topics" slide in Demo.md, it does not work as expected.

Excerpted from Demo.md # Topics ```julia eval # Create a point of retrun to topic page zr = TextBox(" ✨", PTxsRT..., cs) xs = (PTcontent) ``` - Topic 1 - Topic 2 - Topic 3 ```julia eval xs = PTxsTP # PTxsTB: Work, PTxsTP: Doesn't work ```

Note:

  1. MD2PPTX.jl utilizes a Markdown file as a PPTX.jl script file to generate a PowerPoint file. Maybe it can be used to create test cases for PPTX.jl too. In fact, if a similar feature is implemented in PPTX.jl, it would be wonderful!
  2. The "sx = (PTcontent)" in Demo.md needs to be changed to "xs = (PTcontent)".

To reproduce the error

./MD2PPTX.jl Demo.md Demo.pptx > Demo.log

MD2PPTX.jl.txt Demo.md Demo.pptx Demo.log

Test environment

macOS version: Darwin hostname 23.6.0 Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000 arm64

Julia Release: Version 1.11.0 (2024-10-07)

Julia Packages: (@v1.11) pkg> status [c7e460c6] ArgParse v1.2.0 [992eb4ea] CondaPkg v0.2.23 [a93c6f00] DataFrames v1.7.0 [85a47980] Dictionaries v0.4.2 [1fa38f19] Format v1.3.7 [cd3eb016] HTTP v1.10.8 [4ef9e186] HerbGrammar v0.4.0 [5bbddadd] HerbInterpret v0.1.4 [3008d8e8] HerbSearch v0.3.1 [6d54aada] HerbSpecification v0.1.0 [682c06a0] JSON v0.21.4 [98e50ef6] JuliaFormatter v1.0.62 [14a86994] PPTX v0.9.0 [9b87118b] PackageCompiler v2.1.20 [fae87a5f] ParserCombinator v2.2.1 [438e738f] PyCall v1.96.4 [de0858da] Printf v1.11.0

PowerPoint Version: Microsoft® PowerPoint for Mac Version 16.89.1 (24091630)

matthijscox-asml commented 1 day ago

Thanks for this extensive bug report.

Your code is already quite complicated. I'm wondering if we can narrow it down to a simpler example. Perhaps the problem is that multiple links to multiple slides somehow goes wrong in PPTX currently. (The linking is a pretty new feature, I can image we missed some corner cases.) If I have time I'll try to reproduce the error with such a simpler example. But maybe you can already generate one?

If I understand correctly, every "topic" links to another slide? So it would it be something like this?

using PPTX
p = Presentation()
s1 = Slide()
push!(p, s1)
s2 = Slide()
push!(p, s2)
s3 = Slide()
push!(p, s3)

text1 = TextBox("link to slide 2", hlink = s2, y_offset = 50)
text2 = TextBox("link to slide 3", hlink = s3, y_offset = 100)
push!(s1, text1)
push!(s1, text2)

write("example.pptx", p)
ysaereve commented 1 day ago

Thank you so much for the prompt reply. Yes, every topic links to another slide. Based on the program you provided, the following is a program that can reproduce the same error in the test environment as reported. I’ll keep my fingers crossed and hope for good news.

using PPTX

p = Presentation() # A Cover Page will be created
# Create other slides
s1 = Slide(title="Topics") # Topic Page
push!(p, s1)
s2 = Slide(title="Topic 1") # Topic 1
push!(p, s2)
s3 = Slide(title="Topic 2") # Topic 2
push!(p, s3)
s4 = Slide(title="Topic 3") # Topic 3
push!(p, s4)

text1 = TextBox("Link to Topic 1", hlink = s2, offset_y = 50)
text2 = TextBox("link to Topic 2", hlink = s3, offset_y = 100)
text3 = TextBox("link to Topic 3", hlink = s4, offset_y = 150)

push!(s1, text1)
push!(s1, text2)
push!(s1, text3)

write("example.pptx", p)
matthijscox-asml commented 1 day ago

Thanks for the example. I tried it, and after repairing the pptx, I see that both "Topic 2" and "Topic 3" links go to slide 4. Topic 1 still goes to slide 2 as expected. I'll see if I can troubleshoot this.

ysaereve commented 1 day ago

That's great! Thank you!

matthijscox-asml commented 1 day ago

Quick observation after unzipping the pptx before and after repair.

Before repair

Before repair we set the relationship ids in the slide2.xml.rels and the presentation.xml.rels with the same id.

slide2.xml.rels:

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout2.xml"/>
    <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide3.xml"/>
    <Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide4.xml"/>
    <Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide5.xml"/>
</Relationships>

presentation.xml.rels

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster" Target="slideMasters/slideMaster1.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps" Target="presProps.xml"/>
    <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps" Target="viewProps.xml"/>
    <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/tableStyles" Target="tableStyles.xml"/>
    <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide1.xml"/>
    <Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide2.xml"/>
    <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide3.xml"/>
    <Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide4.xml"/>
    <Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide5.xml"/>
</Relationships>

After repair

But after repair, these have become unrelated. And in the slide2.xml.rels they are made subsequent again (1, 2, 3) instead of (1, 7, 8, 9).

slide2.xml.rels

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide5.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide3.xml"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout2.xml"/>
</Relationships>

Slide 5 has rId3 above.

presentation.xml.rels

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps" Target="viewProps.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide2.xml"/>
    <Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps" Target="presProps.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide1.xml"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster" Target="slideMasters/slideMaster1.xml"/>
    <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide5.xml"/>
    <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide4.xml"/>
    <Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/tableStyles" Target="tableStyles.xml"/>
    <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide3.xml"/>
    <Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
</Relationships>

Here slide 5 has rId6. So those relationship ids are not linked.

Inside slide2.xml you see that the slide2.xml.rel relationship id is used:

<p:cNvPr id="3" name="TextBox">
  <a:hlinkClick r:id="rId3" action="ppaction://hlinksldjump"/>
</p:cNvPr>

Possible solution

I could try to make the relationship ids subsequent in the slide.xml.rels and completely unrelated from the presentation.xml.rels.

matthijscox-asml commented 1 day ago

I could try to make the relationship ids subsequent in the slide.xml.rels and completely unrelated from the presentation.xml.rels.

I tried this locally, but it still wants to repair.

I also found out that the repaired .pptx has these extra lines in the [Content_Types].xml file, maybe that's the problem:

    <Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
    <Override PartName="/ppt/slides/slide2.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
    <Override PartName="/ppt/slides/slide3.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
    <Override PartName="/ppt/slides/slide4.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
    <Override PartName="/ppt/slides/slide5.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/>
matthijscox-asml commented 1 day ago

I also found out that the repaired .pptx has these extra lines in the [Content_Types].xml file, maybe that's the problem

I added those lines, also doesn't fix the problem... I'm out of ideas right now

matthijscox-asml commented 1 day ago

Another observation, even have two links to the same slide gives the repair action. But then after the repair, the .pptx works fine.

So in the example above, I just used this:

text1 = TextBox("Link to Topic 1", hlink = s3, offset_y = 50)
text2 = TextBox("link to Topic 2", hlink = s3, offset_y = 100)
text3 = TextBox("link to Topic 3", offset_y = 150)

In this case we added the link twice to the slide2.xml.rels:

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout2.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide4.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slide4.xml"/>
</Relationships>

I guess I can at least fix that by not adding duplicates.

ysaereve commented 1 day ago

Thank you for sharing your test methods and observations. I installed the "Diff Folders" VS Code extension and confirmed the same results as you. It seems the PowerPoint repair feature broke the links. The question now is: what triggered the repair?

I will do more tests to see if I can find anything. Here is what I have now.

There are only two files are different between the two: example-NE/ppt/slides/slide2.xml example-WE/ppt/slides/slide2.xml

example-NE/ppt/slides/_rels/slide2.xml.rels example-WE/ppt/slides/_rels/slide2.xml.rels

example-NE.jl (No Error)

using PPTX

p = Presentation() # A Cover Page will be created
# Create other slides
s1 = Slide(title="Topics") # Topic Page
push!(p, s1)
s2 = Slide(title="Topic 1") # Topic 1
push!(p, s2)
s3 = Slide(title="Topic 2") # Topic 2
push!(p, s3)
s4 = Slide(title="Topic 3") # Topic 3
push!(p, s4)

text1 = TextBox("Link to Topic 1", hlink = s2, offset_y = 50)
# text2 = TextBox("link to Topic 2", hlink = s3, offset_y = 100)
# text3 = TextBox("link to Topic 3", hlink = s4, offset_y = 150)

push!(s1, text1)
# push!(s1, text2)
# push!(s1, text3)

write("example-NE.pptx", p)

example-WE.jl (With Error)

using PPTX

p = Presentation() # A Cover Page will be created
# Create other slides
s1 = Slide(title="Topics") # Topic Page
push!(p, s1)
s2 = Slide(title="Topic 1") # Topic 1
push!(p, s2)
s3 = Slide(title="Topic 2") # Topic 2
push!(p, s3)
s4 = Slide(title="Topic 3") # Topic 3
push!(p, s4)

text1 = TextBox("Link to Topic 1", hlink = s2, offset_y = 50)
text2 = TextBox("link to Topic 2", hlink = s3, offset_y = 100)
# text3 = TextBox("link to Topic 3", hlink = s4, offset_y = 150)

push!(s1, text1)
push!(s1, text2)
# push!(s1, text3)

write("example-WE.pptx", p)
matthijscox-asml commented 12 hours ago

I have a branch now where I at least fixed the duplicate hlinks.

But I am not making any progress on fixing the multiple hlinks.

matthijscox-asml commented 11 hours ago

Okay I think I found the problem, in the textbox hyperlink I should use r:id="rId2" instead of rId="rId2". It's weird that this doesnt fail for a single hyperlink...

Basically the XML was saying:

<a:hlinkClick action="ppaction://hlinksldjump" rId="rId2"/>

and it should say:

<a:hlinkClick action="ppaction://hlinksldjump" r:id="rId2"/>
matthijscox-asml commented 10 hours ago

This issue should be fixed in v0.9.1. You can install once the registry PR is merged. Please let me know if your code works now with the new version.

ysaereve commented 7 hours ago

It (v0.9.1) works! 👍 I just tested it. Thank you so much for your help!