DOCX broken placeholders

yuryhorinovich commented 4 years ago

Hello. I'm trying to use XTT. I've found out that when you save Word document in DOCX format sometimes one placeholder is being saved as two separate XML blocks. This is a result of partial editing of that placeholder.

  <w:p w:rsidR="00C6425F" w:rsidRPr="004E4DE5" w:rsidRDefault="00C6425F" w:rsidP="00C6425F">
      <w:pPr>
          <w:rPr>
              <w:lang w:val="en-US"/>
          </w:rPr>
      </w:pPr>
      <w:r>
          <w:rPr>
              <w:lang w:val="en-US"/>
          </w:rPr>
          <w:t>{R-</w:t>
      </w:r>
      <w:r w:rsidRPr="004E4DE5">
          <w:rPr>
              <w:lang w:val="en-US"/>
          </w:rPr>
          <w:t>DIVISIONFULLNAME</w:t>
      </w:r>
      <w:r>
          <w:rPr>
              <w:lang w:val="en-US"/>
          </w:rPr>
          <w:t>}</w:t>
      </w:r>
  </w:p>

This behavior is mentioned in the Example N01:

In a word, a block that looks the same can consist of several with the same formatting.

~~To exclude such a case, you need to copy the block from {} to notepad, copy it and paste it back~~

From new version ZCL_XTT_WORD_DOCX & ZCL_XTT_WORD_XML classes in such cases would use a style of the first part

If a placeholder is copied to Notepad and then pasted back to Word the issue is gone. The placeholer is correctly replaced from ABAP.

It look like this issue was already addressed for placeholders with different formatting. We have no special formatting for the placeholder. So it looks like the issue with separated XML parts is still exist.

bizhuka commented 4 years ago

Hello could you send me the template? so I could analyze the issue

in a code I wrote REGEX to delete XML tags within placeholder

@see ZCL_XTT_REPLACE_BLOCK=>FIND_MATCH()

    " Delete all rubbish between
    IF iv_skip_tags = abap_true.
      REPLACE ALL OCCURRENCES OF REGEX '<[^\>]+>' IN l_whole_field WITH ''.
    ENDIF.

probably it don't match with your case

and also I have a question Do you edit XML externally (in notepad++ for example)? the previous REGEX cannot delete new line symbols (and some special chars too) I think that WORD save the whole document in a one line, that's why previous REGEX usually works fine

bizhuka commented 3 years ago

from https://github.com/bizhuka/xtt/commit/4445c39cc01ad1eca4c82dedc13eb14cec945ae4

such cases

would be replaced with '{' char formatting

bizhuka / xtt

DOCX broken placeholders #6