GrammaTech / gtirb

Intermediate Representation for Binary analysis and transformation
https://grammatech.github.io/gtirb/
Other
305 stars 36 forks source link

Scope for Node Identifiers #36

Closed SapientGuardian closed 4 years ago

SapientGuardian commented 4 years ago

Hello! I am writing a C# library for manipulating GTIRB (https://github.com/SapientGuardian/GtirbSharp). Initially it began as a port of the Java implementation, but I found that to be rather incomplete and have had to piece together some details from the other languages.

One thing that is not clear from the documentation or the existing implementations is the intended scope for Node identifiers (UUIDs).

eschulte commented 4 years ago

Hi, thanks for reaching out, I'm excited to hear that you're working on a C# library for manipulating GTIRB! If this goes well maybe we could talk about folding your work into the core repository at some point.

With respect to your question, the scope and location of UUID stores has been a recent point of discussion--and the Python functionality was recently changed explicitly to ensure that UUIDs were scoped within an IR instance. Currently the behavior is as follows across APIs:

So I would suggest that the C# API should similarly store the node lookup under the IR. This ensures that one can load two IRs with the same UUIDs. This became an issue for us when we wanted to compare two GTIRB files from before and after a rewriting transformation and the files had many of the same UUIDs. This also has the nice property that the UUIDs are no longer global data, which is typically a good thing to avoid in a library.

In part we haven't updated the language agnostic documentation because there has still been some discussion on this point internally. We will likely update the language agnostic documentation to explicitly state that the IR is the scope under which UUIDs are to be unique.

W.r.t. IR being a node and having a UUID, I agree that we are left in a weird situation with no way to lookup an IR by UUID aside from under the IR itself--which is circular. However, the only source of UUIDs are sub-IR data structures and AuxData tables, which themselves have to live under an IR, so it isn't likely that anyone will ever want to lookup an IR by UUID. Probably there is no reason for us to give IR's UUIDs and we shouldn't have made them nodes to begin with--but I think this is a minor issue and likely not worth changing at this point.

SapientGuardian commented 4 years ago

Thanks for the quick reply. In my implementation I've removed the Node cache/lookup functionality from the Node class, created an INodeContext interface very similar to the context of the C++ implementation, and implemented it with IR. That way, it's easy to move around if the thinking around this concept changes.

Feel free to close this issue, unless you want to keep it open for further discussion or as a work item to add clarification to the documentation and/or fix the Java implementation.

eschulte commented 4 years ago

Your approach sounds good. I'll close this issue. Thanks