Fix #1692. Builds on work started some time ago by @0b10011, to whom primary credit is due.
Html Reader does not process the head section of the document, and, in particular, does not process its style section. It will, however, process inline styles, so 0b10011's model of adding the title as a text run (with styles) will work well once this change is applied. However, that model would not deal with the alternative method of assigning a Title Style, and just adding the title as text. In order to accommodate that, I have removed the declaration of heading font styles in the head section, and now generate them all inline in the body. This has the added benefit of being able to read the doc as html, then saving it as docx, preserving, at least in part, any user-defined font styles. Note that html does have pre-defined title styles, but docx does not.
@constip suggests in the original issue that margin top and bottom are being applied too frequently. I believe that was addressed by recently merged PR #2475. It is also suggested that the * css selector be dropped in favor of body. 2475 added the body selector. I agree that this renders the * selector unnecessary, and, as stated in the issue, it can cause problems. This PR drops that selector. It is also suggested that loadHTML be used instead of loadXML. This is not as easy a change as it seems, because loadHTML uses ISO-8859-1 charset rather than UTF-8, so I will not attempt that change.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
Fixes # (issue)
Checklist:
[ ] I have run composer run-script check --timeout=0 and no errors were reported
[ ] The new code is covered by unit tests (check build/coverage for coverage report)
[ ] I have updated the documentation to describe the changes
coverage: 97.21% (+0.002%) from 97.208%
when pulling c0f23106c1c47b39562bc4ee51d615d6b7210c96 on oleibman:word1692
into 2daa50c6f34c9cb6c532f72350e4bd8d466d6c71 on PHPOffice:master.
Fix #1692. Builds on work started some time ago by @0b10011, to whom primary credit is due.
Html Reader does not process the
head
section of the document, and, in particular, does not process itsstyle
section. It will, however, process inline styles, so 0b10011's model of adding the title as a text run (with styles) will work well once this change is applied. However, that model would not deal with the alternative method of assigning a Title Style, and just adding the title as text. In order to accommodate that, I have removed the declaration of heading font styles in the head section, and now generate them all inline in the body. This has the added benefit of being able to read the doc as html, then saving it as docx, preserving, at least in part, any user-defined font styles. Note that html does have pre-defined title styles, but docx does not.@constip suggests in the original issue that margin top and bottom are being applied too frequently. I believe that was addressed by recently merged PR #2475. It is also suggested that the
*
css selector be dropped in favor ofbody
. 2475 added the body selector. I agree that this renders the*
selector unnecessary, and, as stated in the issue, it can cause problems. This PR drops that selector. It is also suggested thatloadHTML
be used instead ofloadXML
. This is not as easy a change as it seems, because loadHTML uses ISO-8859-1 charset rather than UTF-8, so I will not attempt that change.Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
Fixes # (issue)
Checklist:
composer run-script check --timeout=0
and no errors were reported