KirillOsenkov / MSBuildStructuredLog

A logger for MSBuild that records a structured representation of executed targets, tasks, property and item values.
MIT License
1.41k stars 188 forks source link

Sensitive data redacting PoC #711

Closed JanKrivanek closed 7 months ago

JanKrivanek commented 8 months ago

Context

Based on offline talks with @KirillOsenkov I'm spinning some experimentation with sensitive data redacting. Currently the main effort was on:

The actual redacting is a very PoC thing - just to demonstrate the idea. In next iteration I'd ideally ref the https://www.nuget.org/packages/MSBuild.BinlogRedactor/ and just inject the replay functionality into it. The package would take care about the redaction of data - as it already has some auto-detection of sensitive data (and more patterns are going to be added soon)

If all the stream utilities in Postprocessing folder feels as too much code to drag - I can reference https://www.nuget.org/packages/DotUtils.StreamUtils/, but viewer seeems to be very minimalistic in package refs - so I rather just used the code.

UX

Updated:

image

image


image


image
JanKrivanek commented 8 months ago

The CI is failing due the new test getting

System.IO.FileNotFoundException : Could not find file 'C:\Users\appveyor\AppData\Local\Temp\1\47b8ec7b-5245-49cf-8842-ef31ceaa4965\47b8ec7b-5245-49cf-8842-ef31ceaa4965\assembly\dl3\39c84884\6d30ae17_e408da01\1.binlog'.

While for obtaining that test binlog (1.binlog) I use same GetTestFile("1.binlog") call: https://github.com/KirillOsenkov/MSBuildStructuredLog/pull/711/files#diff-1ab0219f9bea23d8aa114270d3b3010a73c59dbca476befba614f2e9cb6f89d3R132

as is already used in other tests - e.g.:

https://github.com/KirillOsenkov/MSBuildStructuredLog/blob/ff5383d7849450a80306eda846b07d419c63c6f7/src/StructuredLogger.Tests/BinaryLoggerTests.cs#L93-L95

This works just fine locally. @KirillOsenkov - any quick idea from top of your head, why 1.binlog might be missing?

KirillOsenkov commented 8 months ago

No quick idea, just noticing that in the path the GUID is mentioned twice: 47b8ec7b-5245-49cf-8842-ef31ceaa4965\47b8ec7b-5245-49cf-8842-ef31ceaa4965

probably eyeball GetTestFile() and see where it could go wrong?

KirillOsenkov commented 8 months ago

Some first notes as I read:

  1. better call it "secrets" everywhere, not "passwords"
  2. avoid abbreviations, so "password" instead of "pwd" for example
  3. I'd avoid using Path.GetTempFileName(), it can only hold 65,536 files, can overflow easily (and it's hard to debug), and people don't clean it up. Use the viewer directory (%localappdata%\Microsoft\MSBuildStructuredLog\Temp\), I think we already have code somewhere to write into it.
  4. I think there was a problem with making BuildEventArgsReader public, but I don't quite remember what it is. It could be that it would conflict with the one in MSBuild?
  5. when you update to latest main you'll notice Roman's recent change to support ExtendedCriticalBuildMessageEventArgs
  6. in terms of UI, you could probably show an empty editor in the right pane (there's an easy API for that), and then use each line from that text as a secret, and have a button to "commit". This way you could maybe specify an existing file with secrets?
  7. A design decision that needs to be made is whether to redact in-place or into a different file.
  8. Are you sure we need several new interfaces here, all fine-granular? It looks good, but just check that every interface is justified, or can we make do without?

Otherwise all seems reasonable from a quick read.

JanKrivanek commented 8 months ago

Some first notes as I read:

  1. better call it "secrets" everywhere, not "passwords"
  2. avoid abbreviations, so "password" instead of "pwd" for example
  3. I'd avoid using Path.GetTempFileName(), it can only hold 65,536 files, can overflow easily (and it's hard to debug), and people don't clean it up. Use the viewer directory (%localappdata%\Microsoft\MSBuildStructuredLog\Temp\), I think we already have code somewhere to write into it.
  4. I think there was a problem with making BuildEventArgsReader public, but I don't quite remember what it is. It could be that it would conflict with the one in MSBuild?
  5. when you update to latest main you'll notice Roman's recent change to support ExtendedCriticalBuildMessageEventArgs
  6. in terms of UI, you could probably show an empty editor in the right pane (there's an easy API for that), and then use each line from that text as a secret, and have a button to "commit". This way you could maybe specify an existing file with secrets?
  7. A design decision that needs to be made is whether to redact in-place or into a different file.
  8. Are you sure we need several new interfaces here, all fine-granular? It looks good, but just check that every interface is justified, or can we make do without?

Otherwise all seems reasonable from a quick read.

I'l work on these.

As far as UI - if you have anything specific in mind, a quick non-functional markup mockup would be super helpful as I'm bit rusty with UI. But I can definitely find a way otherwise. I'd probably add the 'Replace In Place' checkbox (or something similar that would flip behavior similar to Save / Save As) - that way we wouldn't need to decide whether it's a in-place redact or separate file - it'd be up to user, with sensible default.

Interfaces - I'd want to keep this as similar to MSBuild as possible - it'll make it easier to port changes both ways. But I'm definitely open to suggetions here

JanKrivanek commented 8 months ago

I hope I addressed all the preliminary comments + I created a proposal for the GUI.

So the current GUI proposal:

image


image


image
JanKrivanek commented 8 months ago

Finalized v1:

It feels ready for review ;-)

KirillOsenkov commented 8 months ago

Thank you for this gargantuan effort! Your hard work and patience are much appreciated! I think it looks really good already.

I really didn't want to cause any unnecessary stop energy, but I'm paranoid about breaking people, so just wanted to iron out a couple of things before this goes in.

Primarily:

  1. do you think it's doable to move the dependency on redaction .dll and the redactor logic from the StructuredLogger.dll and into the "UI" and BinlogTool layer? Technically I'd like to keep StructuredLogger minimal, just read/write binlogs, no UI, no extra stuff. People should be able to build on top of that to add extra functionality. Perhaps StructuredLogViewer.Core is a better place? It's "in between" the core and the UI layers. Or maybe add a stub empty static Redact() delegate in the core, and have the UI plug the real implementation.
  2. similarly, if we could avoid the dependency on the streams .dll... I know it's ugly to inline the source, but historically people have been requesting that it's possible to just download StructuredLogger.dll from the Release and be able to use it, even bypassing the NuGet package entirely. This is why I've been publishing StructuredLogger.dll as part of every release: https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.1.858
  3. is the redaction workflow separate from the regular binlog reading logic? If so, I have no concerns. I'm just worried to not regress the mainline binlog loading scenarios in terms of performance.

The rest (including the UI) seems fine. Once the above is clarified, we could merge and continue iterating on it in separate PRs if needed.

For the binlogtool, we should also later think about supporting a ".rsp" file, so that someone has a file with secrets somewhere (one per line), and we could just pass it to the tool instead of mentioning the secrets in the command line. Also clarify the in-place vs. a new file for command-line.

KirillOsenkov commented 8 months ago

Hmm, I just thought, for Stream utils, do you think you could publish a source-only package? DotUtils.StreamUtils.Source? You could build it from the same source using for example a custom .nuspec file: https://github.com/KirillOsenkov/MSBuildTools/blob/d8afc187fa7be3771743e223a31765a68248f4e5/src/PackNuspec/PackNuspec.csproj#L9-L37

or coerce the built-in MSBuild properties to produce a source-only package in addition to the default one.

I think this would be ideal: doesn't add a runtime dependency, but also avoids forking the source. Wish we could do the same for Microsoft.NET.StringTools.dll, but that ship has sailed I'm afraid.

JanKrivanek commented 8 months ago

Thank you for this gargantuan effort! Your hard work and patience are much appreciated! I think it looks really good already.

I really didn't want to cause any unnecessary stop energy, but I'm paranoid about breaking people, so just wanted to iron out a couple of things before this goes in.

Primarily:

  1. do you think it's doable to move the dependency on redaction .dll and the redactor logic from the StructuredLogger.dll and into the "UI" and BinlogTool layer? Technically I'd like to keep StructuredLogger minimal, just read/write binlogs, no UI, no extra stuff. People should be able to build on top of that to add extra functionality. Perhaps StructuredLogViewer.Core is a better place? It's "in between" the core and the UI layers. Or maybe add a stub empty static Redact() delegate in the core, and have the UI plug the real implementation.
  2. similarly, if we could avoid the dependency on the streams .dll... I know it's ugly to inline the source, but historically people have been requesting that it's possible to just download StructuredLogger.dll from the Release and be able to use it, even bypassing the NuGet package entirely. This is why I've been publishing StructuredLogger.dll as part of every release: https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.1.858
  3. is the redaction workflow separate from the regular binlog reading logic? If so, I have no concerns. I'm just worried to not regress the mainline binlog loading scenarios in terms of performance.

The rest (including the UI) seems fine. Once the above is clarified, we could merge and continue iterating on it in separate PRs if needed.

For the binlogtool, we should also later think about supporting a ".rsp" file, so that someone has a file with secrets somewhere (one per line), and we could just pass it to the tool instead of mentioning the secrets in the command line. Also clarify the in-place vs. a new file for command-line.

I'm grateful for the detailed look and all the feedback! I definitely perceive it as helpful - no worries about interrupting the flow. I'm happy to iterate over things in this PR and not leaving them "for the future" (with few small exceptions - as my plan is to start integrating the 'forward compatibility reading support' after this one is merged and it has some of your concerns addressed - e.g. lazy ArchiveFiles. I'll call those explicitly out and point to the MSBuild PR what's the plan).

As to the individual concerns:

Rest of comments are clear - I'll eaither right away address them or comment inline if any needs more context.

Thank you!! :blush:

KirillOsenkov commented 8 months ago

Sounds great!

StructuredLogger only references these:

image

And Microsoft.Build.Utilities.Core.dll is only used for the Logger type: image

I only mentioned Microsoft.NET.StringTools.dll as an example of a dependency MSBuild didn't add (or added as source-only), because it caused a lot of grief for a lot of tools out there, that use MSBuild libraries, they now needed to ship an extra .dll, update installers, etc.

JanKrivanek commented 7 months ago

@KirillOsenkov - it now feels ready for next round of review.

Main changes:

Looking forward for the next round of feedback :-)

JanKrivanek commented 7 months ago

Oh - nice Please ping me once you'll publish new version of the binlogtool - I'll then work on incorporating into arcade

KirillOsenkov commented 7 months ago

Published:

KirillOsenkov commented 7 months ago

image

image