dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.07k stars 4.04k forks source link

LSIF generator should implement 'standard error protocol' and report exceptions, health, and performance metrics #69610

Open gundermanc opened 1 year ago

gundermanc commented 1 year ago

As a consumer of the Roslyn LSIF generator tool in an automated environment, I have little visibility into when/why it may fail in aggregate. Currently, when a failure is encountered, we must open up and dig into a specific job, repro the issue, and find and sift through the logs or debug the tool. This process is time consuming and makes it hard to triage and act on issues.

I propose that the Roslyn LSIF generator tool be updated to support the following 'standard error protocol' (note that the protocol could be written to a pipe or other output stream).

Protocol Spec

{
  "command": "log",
  "parameters": {
    // One or more command specific parameters.
  },
  "telemetry": {
    // Any arbitrary measurements or health metrics that should be reported.
  }
}

Log command

Initially, log is the only supported command. Here is an example:

{
  "command": "log",
  "parameters": {
    "severity": "Error",
    "exception": "System.InvalidOperationException",
    "callstack  ": "at Foo.Bar() line 350...at Program.Main() line 15",
    "code": "CS1501"
  },
  "telemetry": {
    "Roslyn.LsifGenerator.IsDone": false,
    "Roslyn.LsifGenerator.PercentageDone": 50
  }   
}

Log supports the following parameters:

Note that no explicit limits are placed on length of each of these parameters, but the consumer may truncate them, if they exceed more than a few thousand characters.

Other commands

Currently none though this is left as an open-ended protocol in case we need a way to facilitate future LSIF tool => consumer communication for diagnostic purposes, like:

Roslyn specific implementation

The overarching goal of this work item is to enable automated aggregation of:

Cost Considerations for Logging

My goal is for this logging to be fairly verbose. Anything that might be of value in diagnosing and triaging failures should ideally be written using this logging mechanism and aggregation would happen in the consumer prior to transmission. e.g.: it should be ok to invoke log a few dozen or even a hundred times, so long as writing to STDERR is itself not a bottleneck.

Sample Metrics

I'm looking for 'guard rails' that may indicate something went wrong at this stage. Here are some examples:

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

CyrusNajmabadi commented 1 year ago

Can you add more info on what the "Standard error protocol" is?

gundermanc commented 1 year ago

Can you add more info on what the "Standard error protocol" is?

Done.

FYI @jasonmalinowski