ShiftLeftSecurity / overflowdb

ShiftLeft OverflowDB
Apache License 2.0
112 stars 21 forks source link

prevent duplication of nodeRef on deserialization #355

Closed bbrehm closed 1 year ago

bbrehm commented 1 year ago

As we saw from https://github.com/ShiftLeftSecurity/overflowdbv2/pull/11 , we have a problem when deserializing nodes: There are too many and too heavy (https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205) NodeRefs.

This addresses the "too many" part. The issue is the following: When we deserialize a node, then we ask NodeFactory to please give us a new NodeDb object with the data. But the API helpfully also allocates a NodeRef object that is kept alive by the NodeDb!

So the pointer structure when deserializing looks like this:

NodeRef1 -(`private Node node`)-> NodeDb1 -(`private NodeRef ref`)-> NodeRef2 -(`private Node node`)-> NodeDb1

This is of course nonsense. We fix this by passing pre-existing NodeRef pointers into the createNode function and only allocate a new NodeRef when that is null.

This should save one NodeRef per node, which weighs 32 bytes (with https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205; before it was 40 bytes) .

PS. can you add some more eyes that I got the referenceManager queue / registerRef right? We never want to add a nodeRef twice, nor do we want to forget it.

mpollmeier commented 1 year ago

~Hmm, this doesn't seem to fix the underlying issue yet. I tested with https://github.com/ShiftLeftSecurity/overflowdb-codegen/pull/205 and still got twice as many Refs compared to the Db nodes...~

I take that back, this works, I didn't upgrade the joern version in the odbv2 branch. 🎉