can you add inserting images functionality into your code #10

ghiewa commented 10 years ago

I have a draft for you reference.


# * Copyright (c) 2012-2014 Christopher Ramirez
# *
# * Licensed under the MIT license.

    This project is a document engine which make use of LibreOffice
    documents as templates and use the semantics of jinja2 to control
    variable printing and control flow.

    To render a template:
        engine = Renderer(template_file)
        result = engine.render(template_var1=...)

from __future__ import unicode_literals#, print_function

import io
import re
import sys
import logging
import zipfile
import uuid
from xml.dom.minidom import parseString
import lxml.etree
from cStringIO import StringIO
from jinja2 import Environment, Undefined

ODDA_IMAGE_PREFIX = 'Pictures/odda-'          ################

# Test python versions and normalize calls to basestring, unicode, etc.
    unicode = unicode
except NameError:
    # 'unicode' is undefined, must be Python 3
    str = str
    unicode = str
    bytes = bytes
    basestring = (str, bytes)
    # 'unicode' exists, must be Python 2
    str = str
    unicode = unicode
    bytes = str
    basestring = basestring

    'text:p':               'text:p',
    'paragraph':            'text:p',
    'before::paragraph':   'text:p',
    'after::paragraph':    'text:p',

    'table:table-row':     'table:table-row',
    'table-row':            'table:table-row',
    'row':                   'table:table-row',
    'before::table-row':   'table:table-row',
    'after::table-row':    'table:table-row',
    'before::row':          'table:table-row',
    'after::row':           'table:table-row',

    'table:table-cell':    'table:table-cell',
    'table-cell':           'table:table-cell',
    'cell':                  'table:table-cell',
    'before::table-cell':  'table:table-cell',
    'after::table-cell':   'table:table-cell',
    'before::cell':         'table:table-cell',
    'after::cell':          'table:table-cell',

# ---- Exceptions
class SecretaryError(Exception):

class UndefinedSilently(Undefined):
    # Silently undefined,
    # see
    def silently_undefined(*args, **kwargs):
        return ''

    return_new = lambda *args, **kwargs: UndefinedSilently()

    __unicode__ = silently_undefined
    __str__ = silently_undefined
    __call__ = return_new
    __getattr__ = return_new

# ************************************************
# ************************************************

def pad_string(value, length=5):
    value = str(value)
    return value.zfill(length)

class Renderer(object):
        Main engine to convert and ODT document into a jinja
        compatible template.

        Basic use example:
            engine = Renderer('template')
            result = engine.render()

        Renderer provides an enviroment variable which can be used
        to provide custom filters to the ODF render.

            engine = Renderer('template.odt')
            engine.environment.filters['custom_filer'] = filter_function
            result = engine.render()

    def __init__(self, environment=None, **kwargs):
        Create a Renderer instance.

            environment: Use this jinja2 enviroment. If not specified, we
                         create a new environment for this class instance.

        self.log = logging.getLogger(__name__)
        self.log.debug('Initing a Renderer instance\nTemplate')

        self.images = {}      ###############

        if environment:
            self.environment = environment
            self.environment = Environment(undefined=UndefinedSilently, autoescape=True)
            # Register filters
            self.environment.filters['pad'] = pad_string
            self.environment.filters['markdown'] = self.markdown_filter

    def _unpack_template(self, template):
        # And Open/libreOffice is just a ZIP file. Here we unarchive the file
        # and return a dict with every file in the archive
        self.log.debug('Unpacking template file')

        archive_files = {}
        archive = zipfile.ZipFile(template, 'r')
        for zfile in archive.filelist:
            archive_files[zfile.filename] =

        return archive_files

        self.log.debug('Unpack completed')

    def _pack_document(self, files):
        # Store to a zip files in files
        self.log.debug('packing document')
        zip_file = io.BytesIO()

        zipdoc = zipfile.ZipFile(zip_file, 'a')
        for fname, content in files.items():
            if sys.version_info >= (2, 7):
                zipdoc.writestr(fname, content, zipfile.ZIP_DEFLATED)
                zipdoc.writestr(fname, content)

        # Save images in the "Pictures" sub-directory of the archive.
        if len(self.images):
            for identifier, data in self.images.iteritems():
                #print type(data)
                zipdoc.writestr(ODDA_IMAGE_PREFIX + identifier, data)

        self.log.debug('Document packing completed')

        return zip_file

    def _prepare_template_tags(self, xml_document):
        # Here we search for every field node present in xml_document.
        # For each field we found we do:
        # * if field is a print field ({{ field }}), we replace it with a
        #   <text:span> node.
        # * if field is a control flow ({% %}), then we find immediate node of
        #   type indicated in field's `text:description` attribute and replace
        #   the whole node and its childrens with field's content.
        #   If `text:description` attribute starts with `before::` or `after::`,
        #   then we move field content before or after the node in description.
        #   If no `text:description` is available, find the immediate common
        #   parent of this and any other field and replace its child and 
        #   original parent of field with the field content.
        #   e.g.: original
        #   <table>
        #       <table:row>
        #           <field>{% for bar in bars %}</field>
        #       </table:row>
        #       <paragraph>
        #           <field>{{ bar }}</field>
        #       </paragraph>
        #       <table:row>
        #           <field>{% endfor %}</field>
        #       </table:row>
        #   </table>
        #   After processing:
        #   <table>
        #       {% for bar in bars %}
        #       <paragraph>
        #           <text:span>{{ bar }}</text:span>
        #       </paragraph>
        #       {% endfor %}
        #   </table>

        self.log.debug('Preparing template tags')
        fields = xml_document.getElementsByTagName('text:text-input')

        # First, count secretary fields
        for field in fields:
            if not field.hasChildNodes():

            field_content = field.childNodes[0].data.strip()

            if not re.findall(r'(?is)^{[{|%].*[%|}]}$', field_content):
                # Field does not contains jinja template tags

            is_block_tag = re.findall(r'(?is)^{%[^{}]*%}$', field_content)
                    'block' if is_block_tag else 'variable')

        # Do field replacement and moving
        for field in fields:
            if not field.hasChildNodes():

            field_content = field.childNodes[0].data.strip()

            if not re.findall(r'(?is)^{[{|%].*[%|}]}$', field_content):
                # Field does not contains jinja template tags

            is_block_tag = re.findall(r'(?is)^{%[^{}]*%}$', field_content)
            discard = field
            field_reference = field.getAttribute('text:description').strip().lower()

            if re.findall(r'\|markdown', field_content):
                # a markdown field should take the whole paragraph
                field_reference = 'text:p'

            if field_reference:
                # User especified a reference. Replace immediate parent node
                # of type indicated in reference with this field's content.
                node_type = FLOW_REFERENCES.get(field_reference, False)
                if node_type:
                    discard = self._parent_of_type(field, node_type)

                jinja_node = self.create_text_node(xml_document, field_content)

            elif is_block_tag:
                # Find the common immediate parent of this and any other field.
                while discard.parentNode.secretary_field_count <= 1:
                    discard = discard.parentNode

                if discard is not None:
                    jinja_node = self.create_text_node(xml_document,

                jinja_node = self.create_text_span_node(xml_document,

            parent = discard.parentNode
            if not field_reference.startswith('after::'):
                parent.insertBefore(jinja_node, discard)
                if discard.isSameNode(parent.lastChild):

            if field_reference.startswith(('after::', 'before::')):
                # Do not remove whole field container. Just remove the
                # <text:text-input> parent node if field has it.
                discard = self._parent_of_type(field, 'text:p')
                parent = discard.parentNode


    def _unescape_entities(self, xml_text):
        # unescape XML entities gt and lt
        unescape_rules = {
            r'(?is)({[{|%].*)(&gt;)(.*[%|}]})': r'\1>\3',
            r'(?is)({[{|%].*)(&lt;)(.*[%|}]})': r'\1<\3',
            r'(?is)({[{|%].*)(<.?text:s.?>)(.*[%|}]})': r'\1 \3',

        for p, r in unescape_rules.items():
            xml_text = re.sub(p, r, xml_text)

        return xml_text

    def _encode_escape_chars(self, xml_text):
        # Replace line feed and/or tabs within text span entities.
        find_pattern = r'(?is)<text:([\S]+?)>([^>]*?([\n|\t])[^<]*?)</text:\1>'
        for m in re.findall(find_pattern, xml_text):
            replacement = m[1].replace('\n', '<text:line-break/>')
            replacement = replacement.replace('\t', '<text:tab/>')
            xml_text = xml_text.replace(m[1], replacement)

        return xml_text

    def _render_xml(self, xml_document, **kwargs):
        # Prepare the xml object to be processed by jinja2
        self.log.debug('Rendering XML object')

            template_string = self._unescape_entities(xml_document.toxml())
            jinja_template = self.environment.from_string(template_string)
            result = jinja_template.render(**kwargs)
            result = self._encode_escape_chars(result)

                return parseString(result.encode('ascii', 'xmlcharrefreplace'))
                self.log.error('Error parsing XML result:\n%s', result, exc_info=True)

            self.log.error('Error rendering template:\n%s',
                           xml_document.toprettyxml(), exc_info=True)
            self.log.debug('Rendering xml object finished')

    def render(self, template, **kwargs):
            Render a template

                template: A template file. Could be a string or a file instance
                **kwargs: Template variables. Similar to jinja2

                A binary stream which contains the rendered document.

        self.log.debug('Initing a template rendering')
        self.files = self._unpack_template(template)

        # Keep content and styles object since many functions or
        # filters may work with then
        self.content = parseString(self.files['content.xml']) 
        self.styles = parseString(self.files['styles.xml'])
        self.manifest = parseString(self.files['META-INF/manifest.xml'])    ##############

        # Render content.xml
        self.content = self._render_xml(self.content, **kwargs)

        # Render styles.xml
        self.styles = self._render_xml(self.styles, **kwargs)

        # Render META-INF/manifest.xml
        self.manifest = self._render_xml(self.manifest, **kwargs)       ##############

        self.__prepare_namespaces()    ##############

        self.log.debug('Template rendering finished')

        self.files['content.xml'] = self.content.encode('ascii', 'xmlcharrefreplace')
        self.files['styles.xml'] = self.styles.encode('ascii', 'xmlcharrefreplace')
        self.files['META-INF/manifest.xml'] = self.manifest.encode('ascii', 'xmlcharrefreplace')
        document = self._pack_document(self.files)
        return document.getvalue()

    def _parent_of_type(self, node, of_type):
        # Returns the first immediate parent of type `of_type`.
        # Returns None if nothing is found.

        if hasattr(node, 'parentNode'):
            if node.parentNode.nodeName.lower() == of_type:
                return node.parentNode
                return self._parent_of_type(node.parentNode, of_type)
            return None

    def create_text_span_node(self, xml_document, content):
        span = xml_document.createElement('text:span')
        text_node = self.create_text_node(xml_document, content)

        return span

    def create_text_node(self, xml_document, text):
        Creates a text node
        return xml_document.createTextNode(text)

    def inc_node_fields_count(self, node, field_type='variable'):
        """ Increase field count of node and its parents """

        if node is None:

        if not hasattr(node, 'secretary_field_count'):
            setattr(node, 'secretary_field_count', 0)

        if not hasattr(node, 'secretary_variable_count'):
            setattr(node, 'secretary_variable_count', 0)

        if not hasattr(node, 'secretary_block_count'):
            setattr(node, 'secretary_block_count', 0)

        node.secretary_field_count += 1
        if field_type == 'variable':
            node.secretary_variable_count += 1
            node.secretary_block_count += 1

        self.inc_node_fields_count(node.parentNode, field_type)

    def get_style_by_name(self, style_name):
            Search in <office:automatic-styles> for style_name.
            Return None if style_name is not found. Otherwise
            return the style node

        auto_styles = self.content.getElementsByTagName(

        if not auto_styles.hasChildNodes():
            return None

        for style_node in auto_styles.childNodes:
            if style_node.hasAttribute('style:name') and \
               (style_node.getAttribute('style:name') == style_name):
               return style_node

        return None

    def insert_style_in_content(self, style_name, attributes=None,
            Insert a new style into content.xml's <office:automatic-styles> node.
            Returns a reference to the newly created node

        auto_styles = self.content.getElementsByTagName('office:automatic-styles')[0]
        style_node = self.content.createElement('style:style')

        style_node.setAttribute('style:name', style_name)
        style_node.setAttribute('style:family', 'text')
        style_node.setAttribute('style:parent-style-name', 'Standard')

        if attributes:
            for k, v in attributes.items():
                style_node.setAttribute('style:%s' % k, v)

        if style_properties:
            style_prop = self.content.createElement('style:text-properties')
            for k, v in style_properties.items():
                style_prop.setAttribute('%s' % k, v)


        return auto_styles.appendChild(style_node)

    def __prepare_namespaces(self):
        """create proper namespaces for our document
        # create needed namespaces
        self.namespaces = dict(

        def _(s):
            return lxml.etree.parse(StringIO(s.toxml('utf-8'))).getroot().nsmap

        # copy namespaces from original docs

        # remove any "root" namespace as lxml.xpath do not support them
        self.namespaces.pop(None, None)

        # declare the Jinja2 namespace
        self.namespaces['py'] = JINJA_URI

        #print self.namespaces

    def __replace_image_links(self):
        """Replace links of placeholder images (the name of which starts with "odda.")
        to point to a file saved the "Pictures" directory of the archive.
        if not len(self.images):

        def _(s):
            image_expr = "//draw:frame[starts-with(@draw:name, 'odda.')]"

            content_tree = lxml.etree.parse(StringIO(s.toxml().encode('utf-8')))
            # Find draw:frame tags.
            draw_frames = content_tree.xpath(image_expr, namespaces=self.namespaces)
            for draw_frame in draw_frames:
                # Find the identifier of the image (py3o.[identifier]).
                image_id = draw_frame.attrib['{%s}name' % self.namespaces['draw']]
                image_id = image_id[5:]
                if image_id not in self.images:
                    raise ValueError(
                        "Can't find data for the image named 'odda.%s'; make "
                        "sure it has been added with the set_image_path or "
                        "set_image_data methods."
                        % image_id

                # Replace the xlink:href attribute of the image to point to ours.
                image = draw_frame[0]
                image.attrib['{%s}href' % self.namespaces['xlink']] = ODDA_IMAGE_PREFIX + image_id
            return lxml.etree.tostring(content_tree)

        self.manifest = _(self.manifest)
        self.content = _(self.content)
        self.styles = _(self.styles)

    def __add_images_to_manifest(self):
        """Add entries for odda images into the manifest file."""

        if not len(self.images):

        def _(s):
            xpath_expr = "//manifest:manifest[1]"
            content_tree = lxml.etree.parse(StringIO(s))

            # Find manifest:manifest tags.
            manifest_e = content_tree.xpath(
            if not manifest_e:
                return None   # TODO

            for identifier in self.images.keys():
                # Add a manifest:file-entry tag.
                    '{%s}file-entry' % self.namespaces['manifest'],
                        '{%s}full-path' % self.namespaces['manifest']: (
                            ODDA_IMAGE_PREFIX + identifier
                        '{%s}media-type' % self.namespaces['manifest']: '',
                return lxml.etree.tostring(content_tree)
        self.manifest = _(self.manifest)

    def set_image_path(self, identifier, path):
        """Set data for an image mentioned in the template.

        @param identifier: Identifier of the image; refer to the image in the
        template by setting "odda.[identifier]" as the name of that image.
        @type identifier: string

        @param path: Image path.
        @type data: string

        f = file(path, 'rb')

    def set_image_data(self, identifier, data):
        """Set data for an image mentioned in the template.

        @param identifier: Identifier of the image; refer to the image in the
        template by setting "py3o.[identifier]" as the name of that image.
        @type identifier: string

        @param data: Contents of the image.
        @type data: binary

        self.images[identifier] = data

    def markdown_filter(self, markdown_text):
            Convert a markdown text into a ODT formated text

        if not isinstance(markdown_text, basestring):
            return ''

        from xml.dom import Node
        from markdown_map import transform_map

            from markdown2 import markdown
        except ImportError:
            raise SecretaryError('Could not import markdown2 library. Install it using "pip install markdown2"')

        styles_cache = {}   # cache styles searching
        html_text = markdown(markdown_text)
        xml_object = parseString('<html>%s</html>' % html_text.encode('ascii', 'xmlcharrefreplace'))

        # Transform HTML tags as specified in transform_map
        # Some tags may require extra attributes in ODT.
        # Additional attributes are indicated in the 'attributes' property

        for tag in transform_map:
            html_nodes = xml_object.getElementsByTagName(tag)
            for html_node in html_nodes:
                odt_node = xml_object.createElement(transform_map[tag]['replace_with'])

                # Transfer child nodes
                if html_node.hasChildNodes():
                    for child_node in html_node.childNodes:

                # Add style-attributes defined in transform_map
                if 'style_attributes' in transform_map[tag]:
                    for k, v in transform_map[tag]['style_attributes'].items():
                        odt_node.setAttribute('text:%s' % k, v)

                # Add defined attributes
                if 'attributes' in transform_map[tag]:
                    for k, v in transform_map[tag]['attributes'].items():
                        odt_node.setAttribute(k, v)

                    # copy original href attribute in <a> tag
                    if tag == 'a':
                        if html_node.hasAttribute('href'):

                # Does the node need to create an style?
                if 'style' in transform_map[tag]:
                    name = transform_map[tag]['style']['name']
                    if not name in styles_cache:
                        style_node = self.get_style_by_name(name)

                        if style_node is None:
                            # Create and cache the style node
                            style_node = self.insert_style_in_content(
                                name, transform_map[tag]['style'].get('attributes', None),
                            styles_cache[name] = style_node

                html_node.parentNode.replaceChild(odt_node, html_node)

        def node_to_string(node):
            result = node.toxml()

            # linebreaks in preformated nodes should be converted to <text:line-break/>
            if (node.__class__.__name__ != 'Text') and \
                (node.getAttribute('text:style-name') == 'Preformatted_20_Text'):
                result = result.replace('\n', '<text:line-break/>')

            # All double linebreak should be replaced with an empty paragraph
            return result.replace('\n\n', '<text:p text:style-name="Standard"/>')

        return ''.join(node_as_str for node_as_str in map(node_to_string,

def render_template(template, **kwargs):
        Render a ODF template file

    engine = Renderer(file)
    return engine.render(**kwargs)

if __name__ == "__main__":
    import os
    from datetime import datetime

    def read(fname):
        return open(os.path.join(os.path.dirname(__file__), fname)).read()

    document = {
        'md_sample': read('')

    countries = [
        {'country': 'United States', 'capital': 'Washington',
            'cities': ['miami', 'new york', 'california', 'texas', 'atlanta']},
        {'country': 'England', 'capital': 'London',
            'cities': ['gales']},
        {'country': 'Japan', 'capital': 'Tokio',
            'cities': ['hiroshima', 'nagazaki']},
        {'country': 'Nicaragua', 'capital': 'Managua',
            'cities': ['leon', 'granada', 'masaya']},
        {'country': 'Argentina',
            'capital': 'Buenos aires'},
        {'country': 'Chile', 'capital': 'Santiago'},
        {'country': 'Mexico', 'capital': 'MExico City',
            'cities': ['puebla', 'cancun']},

    render = Renderer()
    render.set_image_path('logo', 'images/new_logo.png')
    result = render.render('simple_template.odt', countries=countries, document=document)

    output = open('rendered.odt', 'wb')

    print("Template rendering finished! Check rendered.odt file.")
christopher-ramirez commented 10 years ago


Thanks for your proporsal. But I don't quite understand how it works. Does the template need a placeholder image? How are they loaded?

In the lastest lines I see: render.set_image_path('logo', 'images/new_logo.png'. I end up mixing view and controller, which is something we have to avoid.

Please see PR #9.

Last but not the least, thank for your contribution to Secretary.

ghiewa commented 10 years ago

Yes there is a image placehold in odt template. Aim is provide template designer the ability of image size setting. Say -add a image in that odt tamplate, give it name -make sure xxx.png exists in path what you will config in your code render.set_image_path('logo', 'images/new_logo.png' )

christopher-ramirez commented 10 years ago

A preview of image support is now on development branch.

I will be glad to know your comments and suggestions about the intended functionality and API.

ghiewa commented 10 years ago

Frankly, it is great! Only one concern, without set_image_data func, I can not save/store images that generated on fly, but save them as file first and then import it using your 'image_filter'

christopher-ramirez commented 10 years ago

@ghiewa actually the image filter does not make any loading. It just mark a Picture node to be later replaced by replace_images method. The last method get the actual image through a media loader. Is in the media loader where you can generate images on the fly.


    from secretary import Renderer

    engine = Renderer()

    def img_generator(value, *args, **kwargs):
        # Generate an image
        image = my_internal_function_to_generate_imgs(value)

        # Return to replace_images a tuple whose first element  is the recently
        # generated image as a file object (must at least implement read method)
        # and as second element the mimetype of image
        return (image_as_file_object, img_mimetype)

As you can see, the only requirement is that the image object returned by a media loader implements a read method. It is not really necessary to store it on a file. It could be a memory stream.

ghiewa commented 10 years ago

note, but it is not workable with {{barcode('887766666876', 'UPCA')|image}} where barcode('887766666876', 'UPCA') generates barcodes on the fly.

christopher-ramirez commented 10 years ago

How are you generating the bar codes?

ghiewa commented 10 years ago

Here is my code,

    from elaphe import barcode
    import uuid

    def barcode(self, codetype=None, codestring=None, options=dict(includetext=True), **kwargs):
        id = uuid.uuid1().hex
        if not codetype and not codestring:
            return id

        b = barcode(codetype, codestring, options, **kwargs)  # scale=2, margin=1,
        output = io.BytesIO(), 'PNG')
        data = output.getvalue()
        self.set_image_data(id, data)
        return id
christopher-ramirez commented 10 years ago

I wrote the following code. But really I do not know how elaphe.barcode methods works. Specially what it returns.

    from secretary import Renderer

    engine = Renderer()

    def barcode_generator(value, codetype=None, codestring=None,
                          options=dict(includetext=True), **kwargs):
        # Generate a barcode
        id = uuid.uuid1().hex
        if not codetype and not codestring:
            return id

        # Supossing `barcode` returns an SVG image casted as a file object,
        # or at least having a `read()` method.
        bc = barcode(codetype, codestring, options, **kwargs)
        return (bc, 'image/svg+xml')

Have you tried implemented your barcode method as a media loader?

christopher-ramirez commented 10 years ago


It was after posting the above comment that I tried playing with elaphe library. Now you code it's cleaver to me. The elaphe.barcode method returns a EPS image and it's not in a file like object. You have to save the image in a file like object before passing it to Secretary.

So after some changes the above code would look like this:

    from io import BytesIO
    from secretary import Renderer
    from elaphe import barcode

    engine = Renderer()

    def barcode_generator(value, codetype=None, codestring=None,
                          options=dict(includetext=True), **kwargs):
        # Generate a barcode
        id = uuid.uuid1().hex
        if not codetype and not codestring:
            return None

        # Generate barcode image
        bc = barcode(codetype, codestring, options, **kwargs)

        # Save image in a memory stream of file type (BytesIO)
        stream = BytesIO(), 'eps')

        # Set current pos os stream at BoF

        # Return the generated barcode.
        return (stream, 'application/postscript')

DISCLAIMER: It tried elaphe.barcode using qrcode codes. When trying to use UPCA codes PIL gave me strange errors.

ghiewa commented 10 years ago

can you help to update you template file here, sorry, I do not know how you setup image holder. or what you set in picture > option

christopher-ramirez commented 10 years ago

Please see template at samples/images in development branch.

ghiewa commented 10 years ago

Hi Christopher,

Yes, indeed it worked and I met same issue when using 'UPCA'.

more, can you consider the possibility of mulitiple media loader in one template?

additon, do you success on 'application/postscript'? than I do with 'application/image'

return (stream, 'application/postscript')
christopher-ramirez commented 10 years ago

Hello @ghiewa,

First, why do you believe an multiple media loader solution would be useful? What use cases will it cover that are not possible or are very, very hard to accomplish with a single loader?

Lastly, I could not understand the idea on your last paragraph.

ghiewa commented 9 years ago

Hi Christopher,

Say, I am using your library on a price ticket layout generation, there are shose logo and barcode. Then I have to use code you suggested for barcode generation and the same time file loader (it is default media loader) used for shose logo load.

ghiewa commented 9 years ago

ok, I think I get the solution already, thank you.

ghiewa commented 9 years ago

there is no necessity of multiple media loader.

and-semakin commented 6 years ago

Hey, how can I set size of replaced image? Currently it gets the same size as placeholder image and becomes unproportional.

christopher-ramirez commented 6 years ago


If you have a media loader function, it should receive frame_attrs and image_attrs as keyword arguments [1]. This variables are passed by reference to the media loader and they contain the dimensions of the image frame and the actual image respectively. Update the attributes in this variables and Secretary should update the respective element on the document.
