Closed rosinality closed 7 months ago
Appending children should be safe, but anything that would destroy nodes is not. Do not use references to existing nodes if any of its parent nodes has been deleted (either explictly or implicitly by setting inner/outerHTML).
Thank you! If I only need to avoid accessing the child element after removing its parent then I think it maybe not very problematic for these kind of DOM manipulations.
Be aware that the deallocation of any node does not only affect the immediate descendant but the whole subtree. remove_node() is fine as long as you keep the reference around and insert it back into the tree, but once you lose that, the entire subtree is gone. Same goes for explicit deletion or setting of innerHTML, innerText, outerHTML, outerText as mentioned before.
Thank you for a more information! Regarding to this problem, I commonly doing replacing tags (for example, from 'font' tag to 'span' tags) like this:
<font>
A
<font>
B
<font>
C
</font>
</font>
D
</font>
<font>
1
<font>
2
</font>
3
</font>
def replace_node_tags(doc, nodes, new_tag):
for node in nodes:
new_node = doc.create_element(new_tag)
for child in node.child_nodes:
new_node.append_child(child)
node.parent.replace_child(new_node, node)
replace_node_tags(doc, doc.document.get_elements_by_tag_name('font'), 'span')
In this case, for each loop for node in nodes
childs of each node will be appended to new node and newly created node will replace old node. As tags are nested so during the loop children nodes are moved to newly created nodes (parent font node of children font nodes is replaced to span node), or child nodes replaced to newly created nodes (font children node replaced with newly created span node). Would this okay? Thank you!
Looks ok. You cannot use node
or any of the elements in nodes
afterwards, though in this case it shouldn't even be a problem. replace_child()
should properly invalidate the reference. If you try accessing it, you should just get an error that the node is invalid. You would only run into trouble if you had obtained another independent reference to the same node beforehand, which wouldn't be invalidated automatically.
Thank you very much!
Hello, Thank you for the wonderful project!
I have a question about DOM Manipulation and DOM Node. In the document, there are warnings against use of instance of DOMNode after DOM Tree Manipulation.
I am currently working on creating HTML extractor, and there are many DOM manipulations and DOMNode accesses, for example, like this:
I think if I need to re-find DOMNode again for every DOM manipulation operations it will make it hard to do some kind of works. Is there are a concrete example of safe or okay manipulations/accesses or a specific cases where accessing after manipulation will cause error or segfault? Thank you!