Binary Diffing, by which we mean, users can take two (or more) binaries and get a diff view, for the purposes of tracking changes across versions, identifying disparate function changes, and overall binary understanding.
Function Similarity
While we don't necessarily need to produce our own fuzzy function matching as pre-existing solutions already exist (see BinDiff) we would need to integrate and expose this metric (0-1) so that binary tracking can occur, as that feature necessitates the ability to rank functions based on similarity.
Function Fuzzy Matching
A further goal would be to utilize IL diffing to be able to identify similar functions, i.e. fuzzy matching. It should be mentioned that this is not required for an MVP as we can rely on pre-generated sources of this through the likes of BinDiff. By approaching the MVP with extensibility in mind (i.e. more than one source of function similarity) we will be able to integrate with third party sources in the future without having to round-trip it through any internal fuzzy matching system, which might not even be feasible.
IL Diffing
Integral to this feature is the ability to take a Linear LLIL, MLIL or AST HLIL, Pseudo-C representation and produce a diff against another of the same type (i.e. we only want LLIL -> LLIL diff). This naturally will also be used for fuzzy matching and the UI portion would be responsible for displaying this in a sane way, the internal api for this should be callable from python.
Given two functions: old_func and new_func, where:
That is to say, generating just a textual diff would be ineffective at actually identifying differences, as the second example would be the same as the first example.
Tracking / Transfer
Another use of the features laid out above, particularly when it comes to function similarity, is to transfer analysis information from one binary to another. This is separate from passive analysis data transfers such as WARP, where the matching is expected to take place after an analysis session has ended. Because the expectation is that both the former and current binary are open in Binary Ninja we can provide a tighter transfer loop with more control over what is, and isn't applied.
Ex. Binary A and Binary B share function H, J and K, transfer the type, name and other associated info from A to B.
UI Integration
For the UI portion of this we run into the issue of "1 tab 1 view" which must be at least partially broken for this to have a decent UX.
An idea for the diff view would be to allow a user to select from the view carousel the "Diff View" which would take them to what is effectively a graph view of the selected function, they would then create a new pane and repeat this process for another function, however when in the "Diff View" selecting the previous functions sync group, which then would create the visual diff overlay on both panes graph.
Are any alternative solutions acceptable?
Better (or any) support for already existing solutions like BinDiff.
This is really high level and also probably missing a lot of information, I just figure we should make an issue for this sooner rather than later.
Binary Diffing, by which we mean, users can take two (or more) binaries and get a diff view, for the purposes of tracking changes across versions, identifying disparate function changes, and overall binary understanding.
Function Similarity
While we don't necessarily need to produce our own fuzzy function matching as pre-existing solutions already exist (see BinDiff) we would need to integrate and expose this metric (0-1) so that binary tracking can occur, as that feature necessitates the ability to rank functions based on similarity.
Function Fuzzy Matching
A further goal would be to utilize IL diffing to be able to identify similar functions, i.e. fuzzy matching. It should be mentioned that this is not required for an MVP as we can rely on pre-generated sources of this through the likes of BinDiff. By approaching the MVP with extensibility in mind (i.e. more than one source of function similarity) we will be able to integrate with third party sources in the future without having to round-trip it through any internal fuzzy matching system, which might not even be feasible.
IL Diffing
Integral to this feature is the ability to take a Linear LLIL, MLIL or AST HLIL, Pseudo-C representation and produce a diff against another of the same type (i.e. we only want LLIL -> LLIL diff). This naturally will also be used for fuzzy matching and the UI portion would be responsible for displaying this in a sane way, the internal api for this should be callable from python.
Given two functions:
old_func
andnew_func
, where:Given two functions:
old_func
andnew_func
, where:That is to say, generating just a textual diff would be ineffective at actually identifying differences, as the second example would be the same as the first example.
Tracking / Transfer
Another use of the features laid out above, particularly when it comes to function similarity, is to transfer analysis information from one binary to another. This is separate from passive analysis data transfers such as WARP, where the matching is expected to take place after an analysis session has ended. Because the expectation is that both the former and current binary are open in Binary Ninja we can provide a tighter transfer loop with more control over what is, and isn't applied.
Ex. Binary A and Binary B share function
H
,J
andK
, transfer the type, name and other associated info fromA
toB
.UI Integration
For the UI portion of this we run into the issue of "1 tab 1 view" which must be at least partially broken for this to have a decent UX. An idea for the diff view would be to allow a user to select from the view carousel the "Diff View" which would take them to what is effectively a graph view of the selected function, they would then create a new pane and repeat this process for another function, however when in the "Diff View" selecting the previous functions sync group, which then would create the visual diff overlay on both panes graph.
Are any alternative solutions acceptable? Better (or any) support for already existing solutions like BinDiff.
This is really high level and also probably missing a lot of information, I just figure we should make an issue for this sooner rather than later.