flowbased / flowtrace

Traces for retroactive debugging of FBP programs
29 stars 2 forks source link

Diffing traces #17

Open jonnor opened 8 years ago

jonnor commented 8 years ago

For many bugs, it is often possible to narrow things down to get one (or more) test run which succeeds, and one (or more) test runs which fails. Sometimes the cause for difference is in input data, and sometimes in the (version of) code used. Or in some cases, the environment the code is executing in.

But then to figure out why exactly this makes causes difference in behavior. In dataflow/FBP that usually means following the data until it starts diverging. A tricky/tedious part is usually being able to distinguish the insignificant from significant differences. There might be timestamps, payload sizes etcs that continiously vary, but are not of interest. Or in other cases, there are slight differences which are functionally equivalent. For instance existance or value of some keys in an object might not matter at all. Or addition/removal of some elements of an array (but not others). In general we'd need tooling that brings down the differences enough to be analyzed/understood easily/quickly by the developer. It should allow to progressively tighten whats shown when one has narrowed down possibilities.

If one does not know if difference is input or code/graphs, it may be useful with tools to help figure that out. For diffing the graphs, some integration with (yet to be developed) fbp-diff may be useful. Alternatively, one could use require user to combine (proposed)[https://github.com/flowbased/flowtrace/issue/13] flowtrace-extract --graph + fbp-spec for that.

jonnor commented 8 years ago

Some interesting possibilities open up when not just thinking about pairs of traces (1 failing, 1 passing) but of larger amounts of results. Then it may become possible, given a classification of whether something failed or not (which one probably want to give as an executable test ala fbp-spec), analyze the differences and cluster them according to similarity. This may assist in reasoning, or maybe in some cases draw conclusions, about what cause of the issue is.

For some problems one might want to have a fitness score instead of a binary fail/pass classification.