Closed amastis closed 8 months ago
I think you're on the right track. If you open the file as a DocxReader instance, then find the rels file through the footnote instances in the officeDocument file, then you can continue from there with the lxml interface. Lxml will allow you to create and insert new elements (perhaps you'd want to create a copy of an existing element then edit it).
Using a modified version of the .replace_root_text
all of the formatting of my document goes crazy (going into superscript and nullifying some previous formatting changes). See below for modified function.
@ShayHill What am I doing that is causing this to happen or is this something that may be known to occur because of the updating of XML?
def split_text(root: EtreeElement, split_text: str, position: int) -> EtreeElement:
text = root.text.split(split_text)[position]
new_elems = [_copy_new_text(root, line) for line in text.splitlines()]
# insert breakpoints where line breaks were
breaks = [etree.Element(Tags.BR) for _ in new_elems]
return [x for pair in zip(new_elems, breaks) for x in pair][:-1]
def replace_root_text(root: EtreeElement, old: str, new: str) -> None:
"""Replace :old: with :new: in all descendants of :root:
:param root: an etree element presumably containing descendant text elements
:param old: text to be replaced
:param new: replacement text
Will use softbreaks <br> to preserve line breaks in replacement text.
"""
def recursive_text_replace(branch: EtreeElement):
"""Replace any text element contining old with one or more elements.
:param branch: an etree element
"""
for elem in tuple(branch):
if not elem.text or old not in elem.text:
recursive_text_replace(elem)
continue
# split the text into two elements (based on the position of the old text)
left_side = split_text(elem, old, 0)
right_side = split_text(elem, old, -1)
# replace the original element with the new elements
parent = elem.getparent()
assert parent is not None
index = parent.index(elem)
parent[index : index + 1] = [left_side[0], right_side[0]]
recursive_text_replace(root)
Investigating further it looks like there are two reference spots (one in the individual footnote that I am trying to create above), and a second version in the document body itself (usually located before the footnote reference
) there is a newly created <w:bookmarkStart w:id="21" w:name="_Ref162052430"/>
which pairs with the "_Ref162052430"
reference in the footnote.
I have created a way to create the footnote reference—now looking to find and place the corresponding document reference. (will update when having a working version, but slightly limited due to the issue posed at the top of this comment).
It looks like you're throwing out a lot of text if split_text
appears more than one time in root.text
. That would definitely garble something. From looking at the code, it seems you're assuming text = root.text.split(split_text)[position]
will always split root text into two pieces, because you only look at 0 and -1 in replace_root_text
.
General Issue:
Something that I would like to do is to replace text (currently the number of another footnote but not an XML reference) in a footnote with a cross reference to another footnote.
Problem Steps
.replace_root_text
), but that doesn't have either:What I Tried
Method: searching for
2323
in a footnote to replace that text with the reference to footnote 23 on my document (pulled this information from the XML file that was created when I manually made a cross reference)Partial idea from https://github.com/python-openxml/python-docx/issues/359
Do you have any advice on what to do differently?