AngleSharp / AngleSharp.Diffing

A library that makes it possible to compare two AngleSharp node lists and get a list of differences between them.
MIT License
37 stars 6 forks source link

Allow diffing documents which are already parsed #19

Closed jamerst closed 2 years ago

jamerst commented 2 years ago

Instead of passing a string to Compare() and WithTest(), allow a document which has already been parsed with AngleSharp to be passed (i.e. an IDocument/INodeList).

Background

For my application, I need to modify the nodes in the diff (NodeDiff.Node.Control and NodeDiff.Node.Test).

Since the document used in diffing is not exposed externally, my current workaround is to parse the documents, add a diff:guid attribute with a unique value to every node, convert the documents back to strings and pass them to the DiffBuilder. I can then use the unique value to find the node in the document using the attribute value from the node in the diff.

This is not very efficient nor clean, and it would be much more efficient if the nodes in the diff could be modified directly.

This doesn't look difficult or problematic to implement - it should be possible to add two new methods to DiffBuilder: Compare(IDocument document) and WithTest(IDocument document), then store the INodeList instead of the HTML string.

For the original methods, the HTML string can be parsed straight away in the Compare and WithTest methods, then stored as INodeList.

Even if modifying nodes is not possible, this would at least eliminate two unnecessary ToHtml() calls.

egil commented 2 years ago

Hi James,

I am not opposed to creating additional methods on DiffBuilder for this purpose. However, you can also do what the DiffBuilder does yourself quite easily by new'ing up a HtmlDiffer:

var diffStrategy = new DiffingStrategyPipeline();
diffStrategy.AddDefaultOptions();
var comparer = new HtmlDiffer(diffStrategy);

IEnumerable<INode> controlNodes = ...
IEnumerable<INode> testNodes = ...

var diffs = comparer.Compare(controlNodes, testNodes);