elapouya / python-docx-template

Use a docx as a jinja2 template
GNU Lesser General Public License v2.1
1.98k stars 385 forks source link

Differences in editors (Libre and Microsoft) #458

Closed dmitryskachkov closed 1 year ago

dmitryskachkov commented 2 years ago

Describe your problem

Hello! I created new docx template from json data. I need to use html in template variables, so I created subdoc. But result file have Differences in editors (Libre and Microsoft). Some text is missing. You can open file in Microsoft Word and see text, and in Libre (it's lost).

When I try to convert document to PDF (soofice cli) - text is missing. 17-3.docx

How can I save imported form HTML text ?

More details about your problem

code:

`#!/usr/bin/python3

import sys import docxtpl import requests import time from docx import Document

from htmldocx import HtmlToDocx

from docx.shared import Mm from docx.shared import Inches import os, io

base_path = '/home/support/domains/' document_id = sys.argv[1] # 9-2022 hostname = sys.argv[2] # template_file = sys.argv[3] # /sites/base/files.docs.docx

doc = docxtpl.DocxTemplate(template_file) document_url = HTTP_URL

resp = requests.get(url=document_url) data = resp.json()

if data:

Parse body If it use html and presave to temp

desc_document = Document()
new_parser = HtmlToDocx()
new_parser.table_style = 'TableGrid'
new_parser.paragraph_style = 'Body Text'
new_parser.add_html_to_document(data['document']['body'], desc_document)
desc_result_path = "/home/support/domains/tmp/part_" + document_id + ".docx"
desc_document.save(desc_result_path)
sub_doc = doc.new_subdoc(desc_result_path)

context = {
    'document_date': time.strftime('%d.%m.%Y', time.gmtime(int(data['document']['created']))),
    'document_text': sub_doc,
    'document_creator': data['creator'],
    'creator_sign': data['creator_sign'],
    'document_sender': data['creator'],
    'document_sender_role': '',
    'document_creator_role': data['document_creator_role'],
}
if ('sign_list' in data):
    context['document_sender'] = data['signs_list'][-1]['user_full_name']
    context['document_signs'] = data['signs_list']

doc.render(context)

if not os.path.exists(base_path + hostname + "/preview/"):
    os.makedirs(base_path + hostname + "/preview/" + hostname)
doc.save(base_path + hostname + "/preview/" + hostname + "/" + document_id + ".docx")

`

elapouya commented 2 years ago

I do not understand what you want to do. Please provide a much more simple example that reproduce the problem.

dmitryskachkov commented 1 year ago

The file obtained as a result of the code operation opens incorrectly in different editors. You can check this by opening the file that I attached to microsoft word and libreoffice and you will see that data has disappeared in libreoffice.

The 'document_text' variable should contain html like "<p>text</p><h1>fooo</h1>'

Simple example

Json = {'document_text':'<p>text</p><h1>fooo</h1>', 'document_title':'Hello world'}

execute code

In result document.

View document in Microsoft World

Hello World text fooo

View document in LibbreOffice

Hello World

File example : 17-3.docx

dmitryskachkov commented 1 year ago

For test I removed html2doc part of code and leave only subdoc . So problem in part:

doc.new_subdoc("/path/to/document.docx") context = { 'document_text': sub_doc, } doc.render(context) doc.save("/path/tonewfile/document.docx")

elapouya commented 1 year ago

HTML rendering is not supported by docxtpl. You can only display not rendered html text with autoescape=True in render(). If you do not escape html, it will destroy document.xml inside the generated docx. The corruped docx may be interpreted differently with different editors.

ohplz commented 1 year ago

HTML rendering is not supported by docxtpl. You can only display not rendered html text with autoescape=True in render(). If you do not escape html, it will destroy document.xml inside the generated docx. The corruped docx may be interpreted differently with different editors.

can this be supported? or is there a workaround to this? after one afternoon, I can't found how to convert html to richtext, seems a tricky thing.

elapouya commented 1 year ago

Sorry, but there is no obvious way to render html via docxtpl