Open geohci opened 2 years ago
This also causes issues when e.g., a template is moved within text and the tree differ ascribes the changes to the text and not the template. Then when the text post-processing is done, no changes are detected.
Example test that fails because no changes are detected by the differ:
def test_move_template():
curr_wikitext = prev_wikitext.replace('\n{{Use dmy dates|date=April 2017}}',
'{{Use dmy dates|date=April 2017}}\n',
1)
expected_changes = {'Template':{'move':1}}
diff = get_diff(prev_wikitext, curr_wikitext, lang='en')
assert expected_changes == nd.get_diff_count(diff)
Or this diff which falsely assumes the References section was removed: https://wiki-topic.toolforge.org/diff-tagging?lang=simple&revid=7992487
Probably I need an attribute on nodes that is like 'removable' that I set to false for sections that exist unchanged in both revisions.
In edits where there are numerous possibilities for how to account for the changes, the tree differ doesn't always land on the solution that makes the most sense to me.
Example: https://wiki-topic.toolforge.org/diff-tagging?lang=en&revid=1051321835 The counts actually seem right but in fact the tree differ has decided the
{{For...}}}
template moved as can be seen in the intermediate diff: https://edit-types.wmcloud.org/api/v1/diff?lang=en&revid=1051321835The Mediawiki differ got this one correct (I would say -- actually no right answer here). One possibility is that we could an additional cost to changes for nodes that have an exact match between revisions to discourage "changes" or "moves" in nodes that were not actually edited. I'll have to explore what impact that would have. I think it would be relatively cheap to implement at least.