CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
183 stars 55 forks source link

Question: How to turn off included style tag in html head element? #243

Closed bitscompagnie closed 6 years ago

bitscompagnie commented 6 years ago

Hello PyDocx team,

I have below code to generate html files from word files in a folder and it is working fine. How can I turn off the embedded style in the output html file's head tag? And possibly replace it with a link tag to an external css file.

sourcedir = os.listdir("sourcedocx/") 
# Iterate over all docx files in the source directory
for file in sourcedir:
    html = PyDocX.to_html(open('sourcedocx/' + file, 'rb'))
    # Write the result to a new file in the output directory
    with codecs.open('outdocxhtml/' + file + '.html', 'w', 'utf-8') as f:
        # Write each file to the destination folder
        f.write(html)
print('Done writing html files')

I am new to Python and would like to apologize for any inconvenience. Thanks for your help.

jlward commented 6 years ago

Hello,

You can extend PyDocX and update the head method to to use an external CSS sheet. As for removing styles from individual elements, I believe most are adding classes and not styles. Although for each element, you can override the handler for that tag and have it remove any attribute added to the tag that you don't want to include.

bitscompagnie commented 6 years ago

Thanks for your reply.

Is there any example code for inspiration?

jlward commented 6 years ago

Our documentation has some examples

bitscompagnie commented 6 years ago

Thanks so much.