Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
19.95k stars 318 forks source link

Explore other graph algorithms #303

Open Wilfred opened 2 years ago

Wilfred commented 2 years ago

Bidirectional dijkstra might help, but it's really awkward to implement.

It'd be good to explore an explicit visited set for Dijkstra, alongside a priority queue that supports changing priorities. Currently I'm just inserting duplicates in the heap.

It'd also be good to revisit A* search to see if we can get a speedup there.

leeoniya commented 2 years ago

maybe some interesting reading/ideas: https://github.com/anvaka/ngraph.path

Wilfred commented 2 years ago

Tested with https://docs.rs/priority-queue/latest/priority_queue/priority_queue/struct.PriorityQueue.html but it was a net perf loss, even with FxHash.

Wilfred commented 2 years ago

An explicit visited set is a perf loss too, even with FxHash and the same size hint.

Wilfred commented 1 year ago

Another option would be an 'iterative Dijkstra'. Search N nodes, and if we've not reached the end, take the closest node (according to an A* heuristic) and restart the search from there. This would let us keep a bound on the size of predecessors.

Wilfred commented 1 year ago

HashMap::get() is really hot too. I wonder if there's a way of tracking vertex frontier such that we can tell that the vertex has never been visited before (e.g. highest LHS/RHS node ID visited so far).

zachallaun commented 2 months ago

I'm curious whether you've investigated RWS-Diff and whether any of the ideas may apply.

While the paper describes a complete algorithm for generating an edit script from two trees, the fundamental insight (in my opinion) is the use of K-means clustering and nearest neighbor search to approximate subtree similarity. This could be used, for instance, to chunk LHS/RHS into smaller subtree pairs that can be searched separately and then combined.

Wilfred commented 2 months ago

Thanks for the reference! I've not seen that paper, but it looks very relevant.

balta2ar commented 1 month ago

I'm curious whether you've investigated RWS-Diff and whether any of the ideas may apply

really awesome paper, thank you for sharing! before, I've only seen shingling applied to strings and a variation of LSH. It didn't occur to me that one could "shingle" trees to find similarities.

zachallaun commented 1 month ago

before, I've only seen shingling applied to strings and a variation of LSH. It didn't occur to me that one could "shingle" trees to find similarities.

Same here! I found another paper titled Metric Learning for Ordered Labeled Trees with pq-grams that goes a bit deeper into how pq-grams are constructed.