Closed ghiewa closed 10 years ago
Hello! Could you provide me with the template file to check it out?
it is just your file in the path.
https://github.com/christopher-ramirez/secretary/blob/master/simple_template.odt
Thanks for reporting. Actually is not the control flow the origin of the issue. It is the markdown filter (see pag. 2 in simple_template.odt
. Remove this field from the template and then you should be able to create a rendered document. This is because the variables queried in the markdown filter don't exists in the template data. Anyway, this is an unexpected behaviour. I have to debug this.
After switching branches, I could not reproduce the error again. This is kinda of weird.
I try to open 'rendered.otd' with zip, but I can not. I am not sure if that is the root cause or not.
meanwhile, I remove 'title' and 'lenght' filter, erroe is still there.
i have some enlighting form https://pypi.python.org/pypi/py3o.template . The only difference is she using Genshi syntax,
import lxml.etree
from genshi.template import MarkupTemplate
class Template(object):
templated_files = ['content.xml', 'styles.xml', 'META-INF/manifest.xml']
def __init__(self, template, outfile):
"""A template object exposes the API to render it to an OpenOffice
document.
@param template: a py3o template file. ie: a OpenDocument with the
proper py3o markups
@type template: a string representing the full path name to a py3o
template file.
@param outfile: the desired file name for the resulting ODT document
@type outfile: a string representing the full filename for output
"""
self.template = template
self.outputfilename = outfile
self.infile = zipfile.ZipFile(self.template, 'r')
self.content_trees = [
lxml.etree.parse(StringIO(self.infile.read(filename)))
for filename in self.templated_files
]
self.tree_roots = [tree.getroot() for tree in self.content_trees]
# self.py3ocontent = lxml.etree.parse(
# StringIO(self.infile.read("content.xml")))
# self.py3oroot = self.py3ocontent.getroot()
self.__prepare_namespaces()
self.images = {}
def render_flow(self, data):
"""render the OpenDocument with the user data
@param data: the input stream of userdata. This should be a
dictionnary mapping, keys being the values accessible to your
report.
@type data: dictionnary
"""
newdata = dict(
decimal=decimal,
format_float=(lambda val: (
isinstance(val, decimal.Decimal)
or isinstance(val, float)
) and str(val).replace('.', ',') or val),
format_percentage=(lambda val:
("%0.2f %%" % val).replace('.', ',')
)
)
# first we need to transform the py3o template into a valid
# Genshi template.
starting_tags, closing_tags = self.__handle_instructions()
for content_tree, link, py3o_base in starting_tags:
self.__handle_link(
content_tree,
link,
py3o_base,
closing_tags[id(link)][1]
)
self.__prepare_userfield_decl()
self.__prepare_usertexts()
self.__replace_image_links()
self.__add_images_to_manifest()
# out = open("content.xml", "w+")
# out.write(lxml.etree.tostring(self.py3ocontent.getroot()))
# out.close()
self.output_streams = list()
for fnum, content_tree in enumerate(self.content_trees):
template = MarkupTemplate(
lxml.etree.tostring(content_tree.getroot())
)
# then we need to render the genshi template itself by
# providing the data to genshi
self.output_streams.append((
self.templated_files[fnum],
template.generate(**dict(data.items() + newdata.items())))
)
# then reconstruct a new ODT document with the generated content
for status in self.__save_output():
yield status
def render(self, data):
"""render the OpenDocument with the user data
@param data: the input stream of userdata. This should be a
dictionnary mapping, keys being the values accessible to your
report.
@type data: dictionnary
"""
for status in self.render_flow(data):
if not status:
raise ValueError("unknown error")
def __save_output(self):
"""Saves the output into a native OOo document format.
"""
out = zipfile.ZipFile(self.outputfilename, 'w')
for info_zip in self.infile.infolist():
if info_zip.filename in self.templated_files:
# Template file - we have edited these.
# get a temp file
streamout = open(get_secure_filename(), "w+b")
fname, output_stream = self.output_streams[
self.templated_files.index(info_zip.filename)
]
# write the whole stream to it
for chunk in output_stream.serialize():
streamout.write(chunk.encode('utf-8'))
yield True
# close the temp file to flush all data and make sure we get
# it back when writing to the zip archive.
streamout.close()
# write the full file to archive
out.write(streamout.name, fname)
# remove tempfile
os.unlink(streamout.name)
else:
# Copy other files straight from the source archive.
out.writestr(info_zip, self.infile.read(info_zip.filename))
# Save images in the "Pictures" sub-directory of the archive.
for identifier, data in self.images.iteritems():
out.writestr(PY3O_IMAGE_PREFIX + identifier, data)
# close the zipfile before leaving
out.close()
yield True
@ghiewa, are you still having this issue?
yes, I am stuck. I found I can not open odt file generated by zip. Is this the root cause?
Did you changed the generated document extension to .zip?
yes, I use command 'copy rendered.odt rendered.zip', but fail to open new file with zip.
May you provide me with a copy of rendered.odt
?
have sent you via ghiewa [at] 126.com to chris.ramirezg [at} gmail (dot] com
@christopher-ramirez , because system tell me it is a boken zip file when I use zip open odt file your script generated.
with zipfile.ZipFile(self.rendered, 'w') as packed_template:
to
with zipfile.ZipFile('out.odt', 'w') as packed_template:
I can get right file what I wanted. I do not know why?
Here is another version I revised base on your codes, It works well for me.
#!/usr/bin/python
# -*- encoding: utf-8 -*-
"""
Secretary
Take the power of Jinja2 templates to OpenOffice and LibreOffice.
This file implements Render. Render provides an interface to render
Open Document Format (ODF) documents to be used as templates using
the jinja2 template engine. To render a template:
engine = Render(template_file)
result = engine.render(template_var1=...)
"""
from __future__ import unicode_literals, print_function
import re
import sys
import zipfile
import io
import os
import tempfile
from cStringIO import StringIO
import lxml.etree
from xml.dom.minidom import parseString
from jinja2 import Environment, Undefined
import logging
logging.basicConfig(filename='log.log', level=logging.INFO)
def get_secure_filename():
"""creates a tempfile in the most secure manner possible,
make sure is it closed and return the filename for
easy usage.
"""
file_handle, filename = tempfile.mkstemp()
tmpfile = os.fdopen(file_handle, "r")
tmpfile.close()
return filename
# ---- Exceptions
class SecretaryError(Exception):
pass
class UndefinedSilently(Undefined):
# Silently undefined,
# see http://stackoverflow.com/questions/6182498/jinja2-how-to-make-it-fail-silently-like-djangotemplate
def silently_undefined(*args, **kwargs):
return ''
return_new = lambda *args, **kwargs: UndefinedSilently()
__unicode__ = silently_undefined
__str__ = silently_undefined
__call__ = return_new
__getattr__ = return_new
# ************************************************
#
# SECRETARY FILTERS
#
# ************************************************
def pad_string(value, length=5):
value = str(value)
return value.zfill(length)
class Render(object):
"""
Main engine to convert and ODT document into a jinja
compatible template.
Basic use example:
engine = Render('template')
result = engine.render()
Render provides an enviroment variable which can be used
to provide custom filters to the ODF render.
engine = Render('template.odt')
engine.environment.filters['custom_filer'] = filter_function
result = engine.render()
"""
templated_files = ['content.xml', 'styles.xml', 'META-INF/manifest.xml']
def __init__(self, template, outfile, **kwargs):
"""
Builds a Render instance and assign init the internal enviroment.
Params:
template: Either the path to the file, or a file-like object.
If it is a path, the file will be open with mode read 'r'.
"""
self.template = template
self.outputfilename = outfile
self.environment = Environment(undefined=UndefinedSilently, autoescape=True)
# Register provided filters
self.environment.filters['pad'] = pad_string
self.environment.filters['markdown'] = self.markdown_filter
def unpack_template(self):
"""
Loads the template into a ZIP file, allowing to make
CRUD operations into the ZIP archive.
"""
self.infile = zipfile.ZipFile(self.template, 'r')
self.content_trees = [parseString(self.infile.read(filename)) for filename in self.templated_files]
self.content = parseString(self.infile.read('content.xml'))
def pack_document(self):
# Save rendered content and headers
out = zipfile.ZipFile(self.outputfilename, 'w')
for info_zip in self.infile.infolist():
if info_zip.filename in self.templated_files:
streamout = open(get_secure_filename(), "w+b")
fname, output_stream = self.output_streams[
self.templated_files.index(info_zip.filename)
]
streamout.write(output_stream.encode('utf-8'))
streamout.close()
out.write(streamout.name, fname)
os.unlink(streamout.name)
else:
# Copy other files straight from the source archive.
out.writestr(info_zip, self.infile.read(info_zip.filename))
out.close()
def render(self, **kwargs):
"""
Unpack and render the internal template and
returns the rendered ODF document.
"""
self.unpack_template()
self.output_streams = list()
for fnum, content_tree in enumerate(self.content_trees):
self.prepare_template_tags(content_tree)
template = self.environment.from_string(content_tree.toxml())
result = template.render(**kwargs)
self.output_streams.append((
self.templated_files[fnum],
result)
)
self.pack_document()
def node_parents(self, node, parent_type):
"""
Returns the first node's parent with name of parent_type
If parent "text:p" is not found, returns None.
"""
if hasattr(node, 'parentNode'):
if node.parentNode.nodeName.lower() == parent_type:
return node.parentNode
else:
return self.node_parents(node.parentNode, parent_type)
else:
return None
def create_text_span_node(self, xml_document, content):
span = xml_document.createElement('text:span')
text_node = self.create_text_node(xml_document, content)
span.appendChild(text_node)
return span
def create_text_node(self, xml_document, text):
"""
Creates a text node
"""
return xml_document.createTextNode(text)
def prepare_template_tags(self, xml_document):
"""
Search every field node in the inner template and
replace them with a <text:span> field. Flow tags are
replaced with a blank node and moved into the ancestor
tag defined in description field attribute.
"""
fields = xml_document.getElementsByTagName('text:text-input')
for field in fields:
if field.hasChildNodes():
field_content = field.childNodes[0].data.replace('\n', '')
jinja_tags = re.findall(r'(\{.*?\}*})', field_content)
if not jinja_tags:
# Field does not contains jinja template tags
continue
field_description = field.getAttribute('text:description')
if re.findall(r'\|markdown', field_content):
# a markdown should take the whole paragraph
field_description = 'text:p'
if not field_description:
new_node = self.create_text_span_node(xml_document, field_content)
else:
if field_description in \
['text:p', 'table:table-row', 'table:table-cell']:
field = self.node_parents(field, field_description)
new_node = self.create_text_node(xml_document, field_content)
parent = field.parentNode
parent.insertBefore(new_node, field)
parent.removeChild(field)
def get_style_by_name(self, style_name):
"""
Search in <office:automatic-styles> for style_name.
Return None if style_name is not found. Otherwise
return the style node
"""
auto_styles = self.content.getElementsByTagName('office:automatic-styles')[0]
if not auto_styles.hasChildNodes():
return None
for style_node in auto_styles.childNodes:
if style_node.hasAttribute('style:name') and \
(style_node.getAttribute('style:name') == style_name):
return style_node
return None
def insert_style_in_content(self, style_name, attributes=None,
**style_properties):
"""
Insert a new style into content.xml's <office:automatic-styles> node.
Returns a reference to the newly created node
"""
auto_styles = self.content.getElementsByTagName('office:automatic-styles')[0]
style_node = self.content.createElement('style:style')
style_node.setAttribute('style:name', style_name)
style_node.setAttribute('style:family', 'text')
style_node.setAttribute('style:parent-style-name', 'Standard')
if attributes:
for k, v in attributes.iteritems():
style_node.setAttribute('style:%s' % k, v)
if style_properties:
style_prop = self.content.createElement('style:text-properties')
for k, v in style_properties.iteritems():
style_prop.setAttribute('%s' % k, v)
style_node.appendChild(style_prop)
return auto_styles.appendChild(style_node)
def markdown_filter(self, markdown_text):
"""
Convert a markdown text into a ODT formated text
"""
if not isinstance(markdown_text, basestring):
return ''
from xml.dom import Node
from markdown_map import transform_map
try:
from markdown2 import markdown
except ImportError:
raise SecretaryError('Could not import markdown2 library. Install it using "pip install markdown2"')
styles_cache = {} # cache styles searching
html_text = markdown(markdown_text)
xml_object = parseString('<html>%s</html>' % html_text)
# Transform HTML tags as specified in transform_map
# Some tags may require extra attributes in ODT.
# Additional attributes are indicated in the 'attributes' property
for tag in transform_map:
html_nodes = xml_object.getElementsByTagName(tag)
for html_node in html_nodes:
odt_node = xml_object.createElement(transform_map[tag]['replace_with'])
# Transfer child nodes
if html_node.hasChildNodes():
for child_node in html_node.childNodes:
odt_node.appendChild(child_node.cloneNode(True))
# Add style-attributes defined in transform_map
if 'style_attributes' in transform_map[tag]:
for k, v in transform_map[tag]['style_attributes'].iteritems():
odt_node.setAttribute('text:%s' % k, v)
# Add defined attributes
if 'attributes' in transform_map[tag]:
for k, v in transform_map[tag]['attributes'].iteritems():
odt_node.setAttribute(k, v)
# copy original href attribute in <a> tag
if tag == 'a':
if html_node.hasAttribute('href'):
odt_node.setAttribute('xlink:href',
html_node.getAttribute('href'))
# Does the node need to create an style?
if 'style' in transform_map[tag]:
name = transform_map[tag]['style']['name']
if not name in styles_cache:
style_node = self.get_style_by_name(name)
if style_node is None:
# Create and cache the style node
style_node = self.insert_style_in_content(
name, transform_map[tag]['style'].get('attributes', None),
**transform_map[tag]['style']['properties'])
styles_cache[name] = style_node
html_node.parentNode.replaceChild(odt_node, html_node)
def node_to_string(node):
result = node.toxml()
# linebreaks in preformated nodes should be converted to <text:line-break/>
if (node.__class__.__name__ != 'Text') and \
(node.getAttribute('text:style-name') == 'Preformatted_20_Text'):
result = result.replace('\n', '<text:line-break/>')
# All double linebreak should be replaced with an empty paragraph
return result.replace('\n\n', '<text:p text:style-name="Standard"/>')
return ''.join(node_as_str for node_as_str in map(node_to_string,
xml_object.getElementsByTagName('html')[0].childNodes))
def render_template(template, **kwargs):
"""
Render a ODF template file
"""
engine = Render(file)
return engine.render(**kwargs)
if __name__ == "__main__":
import os
from datetime import datetime
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()
document = {
'datetime': datetime.now(),
'md_sample': read('README.md')
}
countries = [
{'country': 'United States', 'capital': 'Washington', 'cities': ['miami', 'new york', 'california', 'texas', 'atlanta']},
{'country': 'England', 'capital': 'London', 'cities': ['gales']},
{'country': 'Japan', 'capital': 'Tokio', 'cities': ['hiroshima', 'nagazaki']},
{'country': 'Nicaragua', 'capital': 'Managua', 'cities': ['león', 'granada', 'masaya']},
{'country': 'Argentina', 'capital': 'Buenos aires'},
{'country': 'Chile', 'capital': 'Santiago'},
{'country': 'Mexico', 'capital': 'MExico City', 'cities': ['puebla', 'cancun']},
]
render = Render('simple_template.odt', 'simple_template_out.odt')
result = render.render(countries=countries, document=document)
print("Template rendering finished! Check rendered.odt file.")
I will take a look at that.
I had to install a test environment using Windows 7. The cause of the error is the open mode in output = open('rendered_document.odt', 'w')
. It should be changed to output = open('rendered_document.odt', 'wb')
.
The official open
documentation states:
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability.
In some part Python should be replacing \10
chars to invalid \13
chars into the zip files, and thus corrupting the final ODT.
I will update the sample code to force a binary write format.
Thanks for reporting.
write a very simple test, can sucessfully get a odt file, but can not open it and get error hint as below,
ERROR
The file 'rendered_document.odt' is corrupt and therefore cannot be opened. LibreOffice can try to repair the file.
The corruption could be the result of document manipulation or of structural document damage due to data transmission.
We recommend that you do not trust the content of the repaired document. Execution of macros is disabled for this document.
Should LibreOffice repair the file?
test enviroment
python 2.7.6 windows 7