Scope for Node Identifiers

SapientGuardian commented 4 years ago

Hello! I am writing a C# library for manipulating GTIRB (https://github.com/SapientGuardian/GtirbSharp). Initially it began as a port of the Java implementation, but I found that to be rather incomplete and have had to piece together some details from the other languages.

One thing that is not clear from the documentation or the existing implementations is the intended scope for Node identifiers (UUIDs).

In the Java implementation https://github.com/GrammaTech/gtirb/blob/57c7e9dc6c3c34abd4fdd6e4c9f196d603e150d8/java/com/grammatech/gtirb/Node.java#L23 the UUID cache is maintained as a static variable reachable through the Node class, meaning that it is shared between any and all instances of IR. Here, the scope for nodes is global.
In the Python implementation https://github.com/GrammaTech/gtirb/blob/68b763e91d96f608aa4bab64a9b61c9c099e0a74/python/gtirb/ir.py#L424 the cache is maintained as an instance variable on the IR and there are tests https://github.com/GrammaTech/gtirb/blob/6b4bc8f6bf80cda89157ed4b5517bcbba57c94ef/python/tests/test_node_from_uuid.py#L17 to explicitly ensure that the context for nodes is scoped to an IR.
In the language agnostic documentation https://github.com/GrammaTech/gtirb/blob/20f1c05859da2808c3a76e105be7ea5c40b71ae3/doc/general/IR.md no mention is made of the IR being the scope of Nodes, nor is any mention of scope made in the documentation for Node https://github.com/GrammaTech/gtirb/blob/4afe637de31bd78e7c21abc270c841ec008d0714/doc/general/Node.md. The documentation does indicate that IR is a Node, which makes it a little awkward for it to be both the container of scope as well as a participant in it if that is the case, and mandates that there exist "A facility to look up Node objects by UUID", without explicitly stating where that facility should exist.

eschulte commented 4 years ago

Hi, thanks for reaching out, I'm excited to hear that you're working on a C# library for manipulating GTIRB! If this goes well maybe we could talk about folding your work into the core repository at some point.

With respect to your question, the scope and location of UUID stores has been a recent point of discussion--and the Python functionality was recently changed explicitly to ensure that UUIDs were scoped within an IR instance. Currently the behavior is as follows across APIs:

C++: Nodes are scoped by context. A single context may include one or more IRs. At some point we'll likely move the UUID lookup to live under an IR.
Python: Nodes scoped by IR, node UUID lookup lives under an IR.
Common Lisp: Nodes scoped by IR, node UUID lookup lives under an IR.

So I would suggest that the C# API should similarly store the node lookup under the IR. This ensures that one can load two IRs with the same UUIDs. This became an issue for us when we wanted to compare two GTIRB files from before and after a rewriting transformation and the files had many of the same UUIDs. This also has the nice property that the UUIDs are no longer global data, which is typically a good thing to avoid in a library.

In part we haven't updated the language agnostic documentation because there has still been some discussion on this point internally. We will likely update the language agnostic documentation to explicitly state that the IR is the scope under which UUIDs are to be unique.

W.r.t. IR being a node and having a UUID, I agree that we are left in a weird situation with no way to lookup an IR by UUID aside from under the IR itself--which is circular. However, the only source of UUIDs are sub-IR data structures and AuxData tables, which themselves have to live under an IR, so it isn't likely that anyone will ever want to lookup an IR by UUID. Probably there is no reason for us to give IR's UUIDs and we shouldn't have made them nodes to begin with--but I think this is a minor issue and likely not worth changing at this point.

SapientGuardian commented 4 years ago

Thanks for the quick reply. In my implementation I've removed the Node cache/lookup functionality from the Node class, created an INodeContext interface very similar to the context of the C++ implementation, and implemented it with IR. That way, it's easy to move around if the thinking around this concept changes.

Feel free to close this issue, unless you want to keep it open for further discussion or as a work item to add clarification to the documentation and/or fix the Java implementation.

eschulte commented 4 years ago

Your approach sounds good. I'll close this issue. Thanks

GrammaTech / gtirb

Scope for Node Identifiers #36