[API Proposal]: Mitigate the sources of privacy and compliance issues by providing and using safe-to-log URIs and safe-to-log exceptions

klauco commented 1 year ago

Background and motivation

At the moment, .NET users need to ensure that PII data is handled accordingly. When it comes to logging, service owners remove PII data to ensure that GDPR or other privacy standards are followed. There are two main sources of PII data in services running on .NET:

URIs. .NET does not natively support the concept of URIs which are safe to log. URIs are used to identify resources on the web, and they often contain sensitive data, such as query parameters, authentication tokens, or personal information. Logging such data can pose security and privacy risks, as well as violate compliance regulations. This lack of URI route concept also affects OpenTelemetry tracing. The default implementation of OpenTelemetry uses the full URI when emitting spans, which can expose sensitive data. This is a problem for services, which need to adhere to strict security and privacy policies.
Exception messages, which can contain sensitive information. There is no guarantee that the exception being thrown does not contain sensitive data. Even when service owners use 1P dependencies only, the remaining exceptions thrown by .NET functionality can contain sensitive information.

To overcome these issues, service owners need to find ways to ensure that their services don’t leak sensitive data, in particular:

When any Uri is logged, they have to keep a separate construct of Uri template with placeholders instead of real parameter values and ensure that the template is logged instead of full Uri. At the same time, service owners disable the default instrumentation from .NET, because it works with full Uris.
When exception message is logged, it must be stripped from the potentially sensitive data in a rather CPU expensive way.

The following requirements should be met:

User should be able to create Safe-to-log Uri which does not contain sensitive information when logged.
Safe-to-log Uri should not contain any sensitive information when logged via ILogger
Safe-to-log Uri should not have any dependencies, which would have to be injected via DI, including redaction.
User should be able to opt-in for unsafe Uri logging, applied to all Uris, including Safe-to-log Uri. In such case, full Uri would be logged, including the sensitive parameter values.
User should be able to define which of the parameters from the route template are safe or unsafe to log. By default, all parameters should be treated as unsafe. Example: “this/is/my/template/with/{tenantId:safe}/and/{userId}”. In this case, the parameters would be “tenantId” and “userId”, and the final Uri being logged would be different with each parameter value: “this/is/my/template/with/myTenant1/and/{userId}”. “this/is/my/template/with/myTenant2/and/{userId}”.
Safe-to-log Uri should be recognized as an alternative to any API which accepts Uri as a parameter. Ideally, Safe-to-log Uri is type derived from Uri.
Safe-to-log Uri should be exposed to downstream dependencies for further consumption. Example: diagnostic listener, OpenTelemetry.NET (when generating spans)
Safe-to-log Uri must be thread-safe.
For URLs, the template string should allow defining all paths of URL. Any part or subpart which is defined as parameter, should be settable at runtime, the hardcoded parts must be readonly. Examples of template string: “https://{host}/this/is/my/template/with/myTenant1/and/{userId}”. “https://myservice.somecompany.com/this/is/my/template/with/myTenant1/and/{userId}”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?search={queryParam}”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?{query}#fragment”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?{query}#{fragment}”.
Safe-to-log Uri must not allow changing the path (or query) directly, it must always match the template.
Safe-to-log Uri must be serialialized the same way as Uri, including sensitive parameter values
Any exception which is thrown from within .NET ecosystem is defined in a way which allows safe logging.
Safe-to-log exception can still be processed (serialized / logged) in the current, unsafe-to-log way.
Users should be able to opt in (via global flag?) for logging the exceptions without sensitive data.
User should be able to adjust the configuration of exception logging at runtime without an app restart, and to turn on and off individual log messages without an app restart, and without having to write a custom filtering class.

API Proposal

[SafeUri("this/is/path/with/{tenantId}/and/{userId}")]
private UrlForSomeService MyUrl {get; set;}

// The two lines above would turn into class via code-gen: 
class UrlForSomeService : Uri {
  public UrlForSomeService(string tenantId, string userId) {...}

  // Route template defined in the parameter of SafeUri attribute
  public string RouteTemplate => "this/is/path/with/{tenantId}/and/{userId}";

  // TenantId generated based on the template placeholders
  public  string TenantId { get; set; }

  // UserId generated based on the template placeholders
  public string UserId { get; set; }

  // String representation to be logged contains the route template on the “right” place
  public override string ToString() => { return nonTemplateParts + RouteTemplate + otherNonTemplateParts; }
}

Define a SafeToLogString type, which together with code-gen, will allow users to define a string template + parameters of the template which can be set during runtime. Stringified version of the instance of the type would return only the template:

public abstract class SafeToLogString 
{
    public SafeToLogString(string template) {…}

    public string Template { get; }
    public abstract IDictionary<string, string> TemplateArguments { get; }

    // Returns the template with placeholders replaced with current values of template arguments
    public string FinalString { 
        get { ... } 
    }

    public override string ToString() {return Template;}
}

[StringTemplate("This is my template with {sensitiveData} and {otherSensitiveData}")]
public MySafeToLogString MySensitiveExceptionMessage { get; set; }

// this could generate something like:

public class MySafeToLogString : SafeToLogString {
    public MySafeToLogString(string sensitiveData, string otherSensitiveData) 
        : base("This is my template with {sensitiveData} and {otherSensitiveData}") {
            TemplateArguments = new Dictionary<string, string>();
            TemplateArguments.Add("sensitiveData", sensitiveData);
            TemplateArguments.Add("otherSensitiveData", otherSensitiveData);
        }

    public string SensitiveData { 
       get { return TemplateArguments["sensitiveData"]; } 
       set { ...} 
    }
    public string OtherSensitiveData { 
      get { return TemplateArguments["otherSensitiveData"]; } 
      set { ...}
    }
} 

public class Exception {
    public Exception (SafeToLogString message) { 
        this._message = message.Template;
        this._data = message.TemplateArguments;
        ... 
    }

    public Exception (SafeToLogString message, Exception innerException) { ... }
}

API Usage

var myUrl1 = new UrlForSomeService("myTenantId", "myUserId");
var myUrl2 = new UrlForSomeService("myTenantId2", "myUserId2");

MySensitiveExceptionMessage = new MySafeToLogString("someSensitiveData", "someOtherSensitiveData");
throw new Exception(MySensitiveExceptionMessage);

Alternative Designs

As an alternative to a generic-purpose (and still very useful) SafeToLogString type, Exception constructor could accept a new ExceptionMessage type


public class ExceptionMessage {
    // the message would contain the safe-to-log message without any sensitive data
    public string MessageTemplate { get; }

    // data which should not be logged are stored here
    public IDictionary AdditionalData { get; }
    // this can be overridable to provide “sanitized” message instead of message template. Some of the template arguments could be potentially safe-to-log
    public override string ToString() { return MessageTemplate; }
}

Exception class would accept the ExceptionMessage object:


public class Exception {
    public Exception (ExceptionMessage messageObject) { 
        this._message = messageObject.ToString();
        this._data = messageObject.AdditionalData;
        ... 
    }

    public Exception (ExceptionMessage message, Exception innerException) { ... }
}

Users could either instantiate the ExceptionMessage directly, or have code-gen based helpers which would enable filling in Message and AdditionalData properties, i.e.:


[ExceptionMessageTemplate("this is exception message with {MyProperty} and {OtherProperty}")]
public partial class MyExceptionMessage {
    public string MyProperty { get; set; }

    public string OtherProperty { get; set; }
}

In the sample above, Message would be filled from the attribute argument and AdditionalData property would be filled from MyProperty and OtherProperty.

Risks

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

### Background and motivation At the moment, .NET users need to ensure that PII data is handled accordingly. When it comes to logging, service owners remove PII data to ensure that GDPR or other privacy standards are followed. There are two main sources of PII data in services running on .NET: 1. URIs. .NET does not natively support the concept of URIs which are safe to log. URIs are used to identify resources on the web, and they often contain sensitive data, such as query parameters, authentication tokens, or personal information. Logging such data can pose security and privacy risks, as well as violate compliance regulations. This lack of URI route concept also affects OpenTelemetry tracing. The default implementation of OpenTelemetry uses the full URI when emitting spans, which can expose sensitive data. This is a problem for services, which need to adhere to strict security and privacy policies. 2. Exception messages, which can contain sensitive information. There is no guarantee that the exception being thrown does not contain sensitive data. Even when service owners use 1P dependencies only, the remaining exceptions thrown by .NET functionality can contain sensitive information. To overcome these issues, service owners need to find ways to ensure that their services don’t leak sensitive data, in particular: 1. When any Uri is logged, they have to keep a separate construct of Uri template with placeholders instead of real parameter values and ensure that the template is logged instead of full Uri. At the same time, service owners disable the default instrumentation from .NET, because it works with full Uris. 2. When exception message is logged, it must be stripped from the potentially sensitive data in a rather CPU expensive way. The following requirements should be met: 1. User should be able to create Safe-to-log Uri which does not contain sensitive information when logged. 2. Safe-to-log Uri should not contain any sensitive information when logged via ILogger 3. Safe-to-log Uri should not have any dependencies, which would have to be injected via DI, including redaction. 4. User should be able to opt-in for unsafe Uri logging, applied to all Uris, including Safe-to-log Uri. In such case, full Uri would be logged, including the sensitive parameter values. 5. User should be able to define which of the parameters from the route template are safe or unsafe to log. By default, all parameters should be treated as unsafe. Example: “this/is/my/template/with/{tenantId:safe}/and/{userId}”. In this case, the parameters would be “tenantId” and “userId”, and the final Uri being logged would be different with each parameter value: “this/is/my/template/with/myTenant1/and/{userId}”. “this/is/my/template/with/myTenant2/and/{userId}”. 6. Safe-to-log Uri should be recognized as an alternative to any API which accepts Uri as a parameter. Ideally, Safe-to-log Uri is type derived from Uri. 7. Safe-to-log Uri should be exposed to downstream dependencies for further consumption. Example: diagnostic listener, OpenTelemetry.NET (when generating spans) 8. Safe-to-log Uri must be thread-safe. 9. For URLs, the template string should allow defining all paths of URL. Any part or subpart which is defined as parameter, should be settable at runtime, the hardcoded parts must be readonly. Examples of template string: “https://{host}/this/is/my/template/with/myTenant1/and/{userId}”. “https://myservice.somecompany.com/this/is/my/template/with/myTenant1/and/{userId}”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?search={queryParam}”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?{query}#fragment”. “https://{host}/this/is/my/template/with/myTenant1/and/{userId}?{query}#{fragment}”. 10. Safe-to-log Uri must not allow changing the path (or query) directly, it must always match the template. 11. Safe-to-log Uri must be serialialized the same way as Uri, including sensitive parameter values 12. Any exception which is thrown from within .NET ecosystem is defined in a way which allows safe logging. 13. Safe-to-log exception can still be processed (serialized / logged) in the current, unsafe-to-log way. 14. Users should be able to opt in (via global flag?) for logging the exceptions without sensitive data. ### API Proposal ```csharp [SafeUri("this/is/path/with/{tenantId}/and/{userId}")] private UrlForSomeService MyUrl {get; set;} // The two lines above would turn into class via code-gen: class UrlForSomeService : Uri { public UrlForSomeService(string tenantId, string userId) {...} // Route template defined in the parameter of SafeUri attribute public string RouteTemplate => "this/is/path/with/{tenantId}/and/{userId}"; // TenantId generated based on the template placeholders public string TenantId { get; set; } // UserId generated based on the template placeholders public string UserId { get; set; } // String representation to be logged contains the route template on the “right” place public override string ToString() => { return nonTemplateParts + RouteTemplate + otherNonTemplateParts; } } ``` Define a SafeToLogString type, which together with code-gen, will allow users to define a string template + parameters of the template which can be set during runtime. Stringified version of the instance of the type would return only the template: ```csharp public abstract class SafeToLogString { public SafeToLogString(string template) {…} public string Template { get; } public abstract IDictionary TemplateArguments { get; } // Returns the template with placeholders replaced with current values of template arguments public string FinalString { get { ... } } public override string ToString() {return Template;} } [StringTemplate("This is my template with {sensitiveData} and {otherSensitiveData}")] public MySafeToLogString MySensitiveExceptionMessage { get; set; } // this could generate something like: public class MySafeToLogString : SafeToLogString { public MySafeToLogString(string sensitiveData, string otherSensitiveData) : base("This is my template with {sensitiveData} and {otherSensitiveData}") { TemplateArguments = new Dictionary(); TemplateArguments.Add("sensitiveData", sensitiveData); TemplateArguments.Add("otherSensitiveData", otherSensitiveData); } public string SensitiveData { get { return TemplateArguments["sensitiveData"]; } set { ...} } public string OtherSensitiveData { get { return TemplateArguments["otherSensitiveData"]; } set { ...} } } public class Exception { public Exception (SafeToLogString message) { this._message = message.Template; this._data = message.TemplateArguments; ... } public Exception (SafeToLogString message, Exception innerException) { ... } } ``` ### API Usage ```csharp var myUrl1 = new UrlForSomeService("myTenantId", "myUserId"); var myUrl2 = new UrlForSomeService("myTenantId2", "myUserId2"); MySensitiveExceptionMessage = new MySafeToLogString("someSensitiveData", "someOtherSensitiveData"); throw new Exception(MySensitiveExceptionMessage); ``` ### Alternative Designs As an alternative to a generic-purpose (and still very useful) SafeToLogString type, Exception constructor could accept a new ExceptionMessage type ```csharp public class ExceptionMessage { // the message would contain the safe-to-log message without any sensitive data public string MessageTemplate { get; } // data which should not be logged are stored here public IDictionary AdditionalData { get; } // this can be overridable to provide “sanitized” message instead of message template. Some of the template arguments could be potentially safe-to-log public override string ToString() { return MessageTemplate; } } ``` Exception class would accept the ExceptionMessage object: ```csharp public class Exception { public Exception (ExceptionMessage messageObject) { this._message = messageObject.ToString(); this._data = messageObject.AdditionalData; ... } public Exception (ExceptionMessage message, Exception innerException) { ... } } ``` Users could either instantiate the ExceptionMessage directly, or have code-gen based helpers which would enable filling in Message and AdditionalData properties, i.e.: ```csharp [ExceptionMessageTemplate("this is exception message with {MyProperty} and {OtherProperty}")] public partial class MyExceptionMessage { public string MyProperty { get; set; } public string OtherProperty { get; set; } } ``` In the sample above, Message would be filled from the attribute argument and AdditionalData property would be filled from MyProperty and OtherProperty. ### Risks _No response_

Author:	klauco
Assignees:	-
Labels:	`api-suggestion`, `area-System.Net`
Milestone:	-

klauco commented 1 year ago

cc @geeknoid @xakep139

danmoseley commented 1 year ago

This has been discussed before (at least for PII in exceptions) -- it is good to restart the discussion. What would a safe to log exception look like? Is callstack+exception type safe to log? In the past we have discussed having the runtime do this stripping at the point of throw, based on some global flag. But given arbitrary logging may contain PII, there presumably should still be redaction at the logger level as well.

Separately we are missing guidance for what is OK to have in an exception message. A secret like a password clearly is not. A user name may not be. What about host name? We have had requests to add that when a host cannot be found, to say what the host is. What we do in this repo right now is not really consistent. If there was a way to strip messages globally, it would be easier to add hostname to an exception message.

We know from talking to customers that those that are concerned with PII in logs (eg., in a business like healthcare), often just disable logging in production. If work like this is successful, would it be possible for them to enable logging, or would they still be concerned about PII in other messages, in which case it would not be an improvement? Some customers do filter, eg., StackExchange https://github.com/NickCraver/StackExchange.Exceptional

danmoseley commented 1 year ago

also -- often developers can't easily debug in production (eg., they need approval to deploy tools) or don't want to - in that case it's helpful to be able to disable stripping and enable detailed logging temporarily, by just changing config or environment.

searching email, this has come up several times in the last few years.

cc @blowdart

blowdart commented 1 year ago

Let's also add the ability to adjust at runtime without an app restart, and to turn on and off individual log messages without an app restart, and without having to write a custom filtering class.

klauco commented 1 year ago

One thing which I did not add to the proposal is the fail-safe. Same as when you are trying to strip sensitive info after it was generated, there could be a mechanism which would kick off right before the logging of the exception is happening. Let's call it ExceptionSanitizer:

public class ExceptionSanitizer {
  public Exception Sanitize(Exception ex) {
     // returns sanitized exception based on the registered sanitizers for a given exception type or assembly the exception was fired from or other information
    return ex;
  }
}

klauco commented 1 year ago

Alternatively, when it comes to figuring out what is safe to include in the exception message and what not, I could imagine fully typed ExceptionMessage object which would define template + typed arguments of the template. @danmoseley - in your example above, host could be a type which you could then register as a type which is safe or unsafe-to-log. So similarly to how data classification works based on attributes, then the definition could be externalized and exposed to end user.

MichalPetryka commented 1 year ago

URIs are used to identify resources on the web, and they often contain sensitive data, such as query parameters, authentication tokens, or personal information. Logging such data can pose security and privacy risks, as well as violate compliance regulations.

One of the basic rules of handling data securely is using POST requests for confidential data instead of GET requests since those leak it in the addresses. Adding APIs that encourage unsecure behaviour shouldn't be a goal for the runtime.

klauco commented 1 year ago

@MichalPetryka This request does not come out of the blue, but from the real experience of service owners. The simplest example where GET exposes sensitive information is Microsoft Graph to GET a user:

GET https://graph.microsoft.com/v1.0/users/<some user id>

The proposed APIs are not to encourage unsecure behaviour, they are there to enable real-world service owners use .NET in a real-world scenarios, where libraries/SDKs/clients are designed in a way which is not 100% in line with the security recommendations.

Additionally, route templates will reduce cardinality of the route dimensions in outgoing HTTP metrics. Even OpenTelemetry specification recommends using low-cardinality values whenever possible, so using route templates or URL templates instead of real high-cardinality URLs.

Joe4evr commented 1 year ago

A user ID is usually not considered sensitive information, though.

klauco commented 1 year ago

@Joe4evr UserId is in the category of End User Pseudonymous identifiers and is considered as personal data. Consider a different example:

/users?$filter=mail eq 'mail@contoso.onmicrosoft.com'

frankshearar commented 5 months ago

I'll just add that "End User Pseudonymous identifier" is a Microsoft term (an M365 term, to be more precise), and logging EUPI results in a Privacy incident. That would be painful enough, but some logging some kinds of EUPI, in combination with other data, can yield GDPR violations.

julealgon commented 5 months ago

Not sure if this is just me, but the concern of knowing whether some value is safe or not to log should not be on the value itself. So, the proposal to create specialized, "safe-to-log" versions of Uri and string is a terrible idea to me.

If we need extra metadata about some type (like route template information for a Uri) that should not be part of a new Uri type but of some container type that has the Uri in it alongside the other data. Composition instead of inheritance.

In other words, I agree with the idea behind this proposal, but not with the solution it proposes.

antonfirsov commented 4 months ago

We will not implement this in the proposed form. The proposal should be broken down to multiple issues.

The primary requests are:

Support for user-defined URL redaction
Support for low-cardinality url-s url.template

We should also think about the following requests, although I'm sceptical they will result in BCL features:

Support for exception message redaction in logs
Support for general templating and parameterization when working with HTTP requests.

For URL redaction we are implementing #103769 to ensure basic privacy, the rest of the work will happen in .NET 10.

dotnet / runtime