ShiftLeftSecurity / overflowdb

ShiftLeft OverflowDB
Apache License 2.0
112 stars 21 forks source link

fix counter-intuitive union choose coalesce and repeat path tracking #362

Closed bbrehm closed 1 year ago

bbrehm commented 1 year ago

edit: Starts from https://github.com/ShiftLeftSecurity/overflowdb/pull/361 but should not be squashed (too many too big changes) scroll down the comments, this PR changed its nature due to the discussion and now also fixes the behavior it critiqued.

This is more of a start for a discussion with code attached: The union, choose and coalesce steps take in a bunch of transform: Traversal[A] => Traversal[B] functions that may be composed of multiple traversal sub-steps. Should the sub-steps partake in path-tracking?

Since there is a bunch of commits stacked into this PR, the relevant question is

      "union" in {
        //fixme: Is this really what we want? Shouldn't substeps be path-tracked?
         centerTrav.enablePathTracking.union(_.out.out).path.toSetMutable shouldBe Set(
          Seq(center, l2),
          Seq(center, r2)
         )
      }

Here, one could argue that the result should rather be

Set(
          Seq(center, l1, l2),
          Seq(center, r1, r2)
         )

i.e. that all the traversal-style substeps inside the union should be path-tracked.

If this was a green-field decision then I would emphatically say yes! They absolutely should partake in path tracking!

Could we implement that? Yes we could. Would be slightly annoying for odbv2, but definitely possible.

Do we need to "fix" / change / improve that?

My personal opinion is that this entire DSL-gremlin "Traversal instead of nice clean loops / functions" stuff is a legacy boondoggle. We need to fix hard bugs and attempt to mostly maintain compat. Otherwise we should place path-tracking, repeat, union, coalesce, choose, where, and, or, etc all into maintenance mode, attempt to avoid that in our code, and even consider adding the @deprecate label to warn users away.

That is a strategy decision that is not for me alone to make, though -- cc @fabsx00 @DavidBakerEffendi @itsacoderepo @ml86 @mpollmeier whether you think all these Traversal features are legacy / maintenance-eventually-deprecate or important / continue-to-improve.

mpollmeier commented 1 year ago

Traversal stuff is a legacy boondoggle

You won't be surprised that I disagree. A few thoughts:

mpollmeier commented 1 year ago

Re path tracking of subqueries: I was gonna say "there's use cases for both" etc. but after giving this some thought I fully agree with the first part of your observation: subqueries should be path-tracked. Since path tracking is rarely used in our code, and never in conjunction with union etc, you should consider this green field and make that change. () to my knowledge, at least in our codebase

maltek commented 1 year ago

Maybe I'm missing something but I can only find a single instance where path tracking is used across our code bases (the CallChainPass in CS). So if you want to drop support for something because path tracking doesn't compose well with all these other steps, I'd leave the rest alone and just remove/deprecate path tracking...

I write much of my code on the REPL first and I find that chaining a bunch of "semicolon-less" steps is much easier there than writing some loops. But even if I agreed that good old loops were preferable, we have code bases littered with all these steps. That code isn't going away and needs to be supported so in the end we'd just be lying to ourselves by slapping a @Deprecated on these methods.

bbrehm commented 1 year ago

Thanks for the replies!

So, it appears that multiple people care about the semicolon-less DSL (no deprecation in sight), and at least michael cares about path-tracking. So I went ahead and fixed it. Woe is me for the odbv2 implementation...

An extra bug that popped up is that repeat forgot the old path preceding the repeat step.

As an additional improvement, choose now evaluates the conditions only once. This is important for people like me who like to write match statements / partial functions where the conditions have side-effects, like

case item if visitedSet.add(item) => // new item! Also not new anymore if we check the condition again...

(yes, this is stupidly fragile and I should stop writing code like that)