Major refactoring of StructuredEditTypes (formerly just EditTypes) differ and some other small bug fixes.
Changes:
StructuredEditTypes:
Renamed from EditTypes and output format has greatly changed.
This is because I added capabilities to the StructuredEditTypes differ to distinguish the specific changes done to each node. Maintaining the structure of the article when computing the diff introduces a large overhead but it means that when nodes are changed, we know what the two versions of them are and can identify the specific differences that were made. So now when an image is changed, the library will also indicate if it was the title, formattitng options, or the caption and what specifically changed between the revisions. Details are also preserved for the textual changes though the differ doesn't know the alignment for them.
Text Formatting no longer considered for moves -- this is due to complications with tracking them properly given that they're reallly two nodes in one: formatting and text.
Fixed bug where a blank current revision didn't count the Lede as removed.
Fixed bug with moved nodes that was throwing an uncaught IndexError
Switched timeout option from a time to just a flag to indicate whether or not to expand very large node trees. Former time-based option was largely ineffective because the check was too late in the process to really cap the processing.
SimpleEditTypes: the simpler differ largely remains the same and is the recommended approach for simple summaries. Timeout option dropped as largely ineffective/unnecessary.
Tokenizer: added Bengali and Devanagari support to the word tokenizer (their words were falsely being split up by spacing characters).
Media: better handling of filenames in galleries to not cut off options/captions and added support for parsing media links to identify formatting options and captions for full EditTypes library.
Tests: Split up tests so test_edittypes_full.py tests the fine details for the full EditTypes differ and test_edittypes_summary.py tests both approaches for the overall count summary.
Documentation: Updated README documentation to reflect changes and include some additional known limitations.
Major refactoring of StructuredEditTypes (formerly just EditTypes) differ and some other small bug fixes.
Changes:
test_edittypes_full.py
tests the fine details for the full EditTypes differ andtest_edittypes_summary.py
tests both approaches for the overall count summary.