Closed bbrehm closed 1 year ago
~Hmm, this doesn't seem to fix the underlying issue yet. I tested with https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205 and still got twice as many Refs compared to the Db nodes...~
I take that back, this works, I didn't upgrade the joern version in the odbv2 branch. 🎉
As we saw from https://github.com/ShiftLeftSecurity/overflowdbv2/pull/11 , we have a problem when deserializing nodes: There are too many and too heavy (https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205) NodeRefs.
This addresses the "too many" part. The issue is the following: When we deserialize a node, then we ask NodeFactory to please give us a new NodeDb object with the data. But the API helpfully also allocates a NodeRef object that is kept alive by the NodeDb!
So the pointer structure when deserializing looks like this:
This is of course nonsense. We fix this by passing pre-existing NodeRef pointers into the
createNode
function and only allocate a newNodeRef
when that isnull
.This should save one NodeRef per node, which weighs 32 bytes (with https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205; before it was 40 bytes) .
PS. can you add some more eyes that I got the referenceManager queue / registerRef right? We never want to add a nodeRef twice, nor do we want to forget it.