CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
183 stars 55 forks source link

when convert word to html, how to get underline with tag <u> rather than <span class="pydocx-underline"> #241

Closed DrJian closed 6 years ago

DrJian commented 6 years ago

hi pydocx team, When converting docx 2 html, I ffind that underline would be converted to , but I hope get the result which using instead of inline class, can I make some changes to get over it ?because I will parse the html string second time,its not easy for me to do this

IuryAlves commented 6 years ago

Hi @DrJian

You can write a class that extends pydocx.export.PyDocXHTMLExporter and overwrite the method export_run_property_underline.

I`ve made a example:

# coding: utf-8

from pydocx.export import PyDocXHTMLExporter
from pydocx.export.html import HtmlTag

class PyDocXHTMLExporterUnderline(PyDocXHTMLExporter):

    def export_run_property_underline(self, run, results):
        tag = HtmlTag('u')
        return self.export_run_property(tag, run, results)

html = PyDocXHTMLExporterUnderline('test.docx').export()

with open('output.html', 'w') as output:
    output.write(html)
DrJian commented 6 years ago

thx a lot, I will try it, I m not very familiar with python

IuryAlves commented 6 years ago

No problem @DrJian.

If you encounter any problems, please let me know.

DrJian commented 6 years ago

your solution is very useful, as a PHP coder, I will learn python, I may use pydocx in my conpany product environment to helo me convert docx to html, Thx a lot again! we can close this issue