christopher-ramirez / secretary

Take the power of Jinja2 templates to OpenOffice and LibreOffice.
Other
190 stars 48 forks source link

new lines in tags with attributes #39

Closed FlorianLudwig closed 7 years ago

FlorianLudwig commented 7 years ago

The issue #8 was only resolved for line breaks in variables that occur within tags that do not contain attributes, like <text:span text:style-name="T5">.

The cause is the regex in _encode_escape_chars

FlorianLudwig commented 7 years ago

I am not sure about the-right-way-tm to fix this.

I guess the reason for not doing:

xml_text = xml_text.replace('\n', '<text:line-break/>')

is to avoid replacing line breaks that are in other than text-tags.

However, the current implementation fails to do this since the following in an odt: <text:span>\n</text:span> would result in all line breaks being replaced.

It could be solved be properly splitting and reassembling the code during regex-pattern matching.

Back to actual problem at hand:

find_pattern = r'(?is)<text:([\S]+?)>([^>]*?([\n|\t])[^<]*?)</text:\1>'

needs to be extended to allow for attributes but it all makes me feel like... Image of Yaktocat

So I was wondering:

  1. Why exactly do we need to make sure we don't replace any "wrong" line-breaks?
  2. Or couldn't we replace them in the xml tree to avoid parsing xml with regex?
christopher-ramirez commented 7 years ago

Hello!

I'm sorry, but I'm not sure I can understand the problem you are reporting. Do you want _encode_escape_chars to escape \n inside entities attribute values, like <text:span attibute="Some\nValue">...? If so, #40 doesn't reflect that...

Or, do you want _encode_escape_chars to handle entities outside of the text: namespace? If so, what specific problems are you solving doing this?

FlorianLudwig commented 7 years ago

Hey Christopher,

sorry for the misunderstanding - I have on "real" issue and one "maybe" issue.

Do you want _encode_escape_chars to escape \n inside entities attribute values, like ...? If so, #40 doesn't reflect that...

I want to escape \n inside text:span tags that do have an attribute, for example:

<text:span text:style-name="T5">\n</text:span>

See test test_newline. That is my "real" problem.


Or, do you want _encode_escape_chars to handle entities outside of the text: namespace? If so, what specific problems are you solving doing this?

No, I don't want that. (Actually I do not care / don't know)

But reading the code I noticed the code does escape outside the text namespace but it seems to be written to not do so. So I would guess that is a bug. See test: test_evil_newline