WICG / js-self-profiling

Proposal for a programmable JS profiling API for collecting JS profiles from real end-user environments
Other
197 stars 13 forks source link

Deobfuscation of JS-Self-Profiling API is challenging due to API output #76

Open magenish opened 2 years ago

magenish commented 2 years ago

In the frames array of the API output, for each frame the API point to the function declaration in the bundle, for example, for the following bar function, the API would point to between “bar” and the “(“: bar (){blabla.....} ^

The described behavior is different from the well known error callstack approach, where the error stack point to the line that was executing in a caller function when the callee function in the stack was called So for example if we have the following code the error call stack would point to the call of bar inside the foo function: bar (){blabla.....}

foo(){bar()} ------^

That causes a lot of issues in deobfuscation/unminification we are trying to tackle over the last year or so, as most open source deobfuscators are expecting the error call stack pattern, they fail to handle the current API output.

After discussions, there are 2 reasons the current output approach was chosen:

  1. Space: Save space in the frames array by representing each function canonically, despite having multiple potential child stacks (i.e. with different callee locations). For example, in the following code: foo(){bar(); bar();} in case the profiler sampled both call to bar, we would have 2 entries if we follow the error stack approach.

    Counter argument- while its a valid concern, there are few ideas on how we can compress the output size that were raised in this github repo, like: https://github.com/WICG/js-self-profiling/issues/74

    We can also introduce a new flag to enable error stack approach. So it would be the consumer decision.

    Furthermore, while I understand the concern about the output size, I believe that in the common case consumers would prefer something that can be easily deobfuscated, even on the expense of the output size.

  2. Ambiguity: the output would point to different location for calling of the same function from different places. For example, in the following code: foo(){bar(); bar();} in case the profiler sampled both call to bar, we would have 2 entries with different line/column offset if we follow the error stack approach.

    Counter argument: Once we follow the error call stack approach deobfuscation becomes trivial and then this ambiguity concern vanish as after deobfuscation both frames would be "bar".

acomminos commented 1 year ago

This indeed seems valuable to interoperate with existing tooling that can symbolize JS errors today.

I'd be curious about how much this bloats both compressed and uncompressed traces. This format (as you mention) would require specifying distinct (line, column) pairs for each unique callsite recording in the trace, whereas today we bucket at function granularity.

JonasBa commented 1 year ago

Correct me if I am wrong, but is this essentially the option of kCallerLineNumbers defined on the V8CpuProfiler here

enum [CpuProfilingMode](https://v8docs.nodesource.com/node-18.2/d2/dc3/namespacev8.html#a874b4921ddee43bef58d8538e3149374) {
   // In the resulting CpuProfile tree, intermediate nodes in a stack trace
   // (from the root to a leaf) will have line numbers that point to the start
   // line of the function, rather than the line of the callsite of the child.
   [kLeafNodeLineNumbers](https://v8docs.nodesource.com/node-18.2/d2/dc3/namespacev8.html#a874b4921ddee43bef58d8538e3149374a0708ecca28676d7d30bf73b0e9a5c852),
   // In the resulting CpuProfile tree, nodes are separated based on the line
   // number of their callsite in their parent.
   [kCallerLineNumbers](https://v8docs.nodesource.com/node-18.2/d2/dc3/namespacev8.html#a874b4921ddee43bef58d8538e3149374a3e94ef1c8f98e36483e590625be60ac8),
 };

If that is the case, we could use this to benchmark some common use-cases and get a rough estimate of such a change to the final format size. I worry that regardless of the outcomes of such a benchmark, the opinions may still vary and reaching consensus on what is an acceptable format size impact may differ.