PhilGale92 / docx

PHP Based Docx Parser
MIT License
38 stars 19 forks source link

Adding styles such as font names and fetching font size #50

Closed tayyabkazmi closed 6 years ago

tayyabkazmi commented 7 years ago

Hello Phil, Great work you have here, it was really interesting. But what I wanted to know was can we fetch the font family and the font size and extend the known styles usage of this project.

PhilGale92 commented 7 years ago

Ahh nice idea, ive been toying with the idea of adding inline style support (font size, colours and all that) for a little while, but im weary about bloating out the HTML output of the render...

Im not averse to adding it, but it won't really be possible for me in the short term im afraid - it will likely need to wait untill i rewrite the code base (this project is like 4 years old and not exactly the cleanest to work with internally! )

Thanks for your interest in the project!

tayyabkazmi commented 7 years ago

You are most welcome :+1: +1: Could you please share how is the parser working, because parsing the whole word xml is quite much impossible for a single programmer. How did you manage to do that

PhilGale92 commented 7 years ago

Sure so it extracts the docx file (its basically a zip) containing some embedded media, and a load of different .xml files. One of which is the main structure.xml. Which contains (almost) all of the content in OOXML format ( http://officeopenxml.com/anatomyofOOXML.php ) .

The parser uses xPath + domdocument to pull in each node in sequence and turns it into an array thats easier to work with, and then renders it into html.

The main chunk of code was table handling... Cant really explain how that works off the top of my head, but getting it to work with vertical + horizontal colspans was a challenge.

And i managed it with a fair bit of coffee with a helping of stubbornness.

PhilGale92 commented 6 years ago

Closing this issue out, but I am keeping exendability in mind for the recode (word-recode branch), I'm hoping to be able to allow for extended word-run rendering so specified colours or font-families could be brought in if needed.