PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.28k stars 2.7k forks source link

Html Reader Process Titles as Headings Not Paragraphs #2533

Open oleibman opened 11 months ago

oleibman commented 11 months ago

Fix #1692. Builds on work started some time ago by @0b10011, to whom primary credit is due.

Html Reader does not process the head section of the document, and, in particular, does not process its style section. It will, however, process inline styles, so 0b10011's model of adding the title as a text run (with styles) will work well once this change is applied. However, that model would not deal with the alternative method of assigning a Title Style, and just adding the title as text. In order to accommodate that, I have removed the declaration of heading font styles in the head section, and now generate them all inline in the body. This has the added benefit of being able to read the doc as html, then saving it as docx, preserving, at least in part, any user-defined font styles. Note that html does have pre-defined title styles, but docx does not.

@constip suggests in the original issue that margin top and bottom are being applied too frequently. I believe that was addressed by recently merged PR #2475. It is also suggested that the * css selector be dropped in favor of body. 2475 added the body selector. I agree that this renders the * selector unnecessary, and, as stated in the issue, it can cause problems. This PR drops that selector. It is also suggested that loadHTML be used instead of loadXML. This is not as easy a change as it seems, because loadHTML uses ISO-8859-1 charset rather than UTF-8, so I will not attempt that change.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Fixes # (issue)

Checklist:

coveralls commented 11 months ago

Coverage Status

coverage: 97.21% (+0.002%) from 97.208% when pulling c0f23106c1c47b39562bc4ee51d615d6b7210c96 on oleibman:word1692 into 2daa50c6f34c9cb6c532f72350e4bd8d466d6c71 on PHPOffice:master.

oleibman commented 10 months ago

@Progi1984 I have made the code change and moved the change notes to the new log. But ...

It seems that this PR is not finished. Isn't it ?

I'm not sure what you mean. What work do you think is still undone?