elastic / elastic-transport-js

Transport classes and utilities shared among Node.js Elastic client libraries
Apache License 2.0
4 stars 24 forks source link

Redact potentially sensitive information from error objects #76

Closed JoshMock closed 7 months ago

JoshMock commented 7 months ago

Exposes new Transport options to improve the JS client's security position on potentially sensitive data.

Right now, many of the Error classes defined in this library include a DiagnosticResult object full of request and connection metadata that is useful for debugging. This can include HTTP headers and URLs. Common sources of sensitive data are removed while serializing to JSON or logging to the console, so common use cases are already covered. However, since that data is still stored on these objects in memory, sensitive data may not be scrubbed when custom serialization methods are used.

To solve for these cases, Error objects now fully redact and/or remove potentially sensitive data at instantiation time rather than at serialization time, so they do not exist in memory at all and cannot be exposed by any means.

These redaction methods are all exposed as options on the Transport class:

See https://github.com/elastic/elasticsearch-js/issues/2026.

JoshMock commented 7 months ago

I'm always a bit on the fence about adding new options, users may overlook the documentation and adds more code to maintain in general.

I agree that adding new options increases maintenance cost and should be avoided when possible, but it didn't seem reasonable in this case.

The goal of the options is to allow multiple redaction "strengths," and to be able to disable it when the old behavior is needed for e.g. local debugging. When discussing with @rudolf, @pgayvallet and @ezimuel, adding these options—with secure but non-breaking-change defaults—seemed like the most flexible approach: current users get better protection by default, users who want more protection can add it, and it can be disabled when necessary.

@delvedor Did you have an idea of how to achieve the same goals with getters? It wasn't clear by your example.

JoshMock commented 7 months ago

I think we should provide some documentation on this feature and how a user can customize the list of keyword to be filtered. Moreover, do we provide a way to switch off this features? I'm assuming we'll remove the sensitive information by default but if someone wants to have it we should provide (e.g. for debugging).

@ezimuel The three new options in the PR description are how all of these things can be customized. The optional list of redaction keys (additionalRedactionKeys) will only append new keys, so the defaults (authorization etc.) cannot be accidentally disabled by adding extra keys. To disable the default keys one would need to set redactDiagnostics to false. I'm willing to revisit that approach if it seems too strict.

As for docs, when I upgrade elasticsearch-js to use this new transport version, I'll add documentation there, as well as some notes in the changelog.

rudolf commented 7 months ago

Common sources of sensitive data are removed while serializing to JSON or logging to the console, so common use cases are already covered. However, since that data is still stored on these objects in memory, sensitive data may not be scrubbed when custom serialization methods are used.

Just to clarify, these existing protections only work on connection not on other sensitive parts of ElasticsearchClientError contained under meta: DiagnosticResult. So while JSON.stringify(connection) is safe, JSON.stringify(Error) is not (same for console.log).

I think both layers are usefulh bb

JoshMock commented 7 months ago

@rudolf I assume you closed on accident :laughing:

JoshMock commented 7 months ago

I chatted with @delvedor earlier today and we talked through his suggestion in detail. Rather than adding a redactDiagnostics option that has to be passed all the way down from the client, we can reduce the number of new options to 2 and just redact potentially sensitive data all the time.

Rather than replacing values with [redacted], the values are still there, just hidden from all serialization and iteration methods by wrapping it in a getter function set on the object using Object.defineProperty. By this method, the only way redacted values can be accessed is by directly hitting the property (e.g. console.log(myObject.authorization)). Someone who is doing that explicitly should be aware that this data is sensitive and proceed accordingly. All serialization methods run on myObject will hide the value:

See commit b486c7910c8f0314e1ce53da76a39c6a68780210 for this change.

redactConnection and additionalRedactionKeys options are still exposed as described above.

Planning to merge this tomorrow, Dec 8.

rudolf commented 7 months ago

Rather than replacing values with [redacted], the values are still there, just hidden from all serialization and iteration methods by wrapping it in a getter function set on the object using Object.defineProperty. By this method, the only way redacted values can be accessed is by directly hitting the property (e.g. console.log(myObject.authorization)). Someone who is doing that explicitly should be aware that this data is sensitive and proceed accordingly. All serialization methods run on myObject will hide the value

I'm not sure how I feel about this...

I 100% agree that JSON.stringify(ErrorWithSecrets) should never serialise those secrets. And here it makes a lot of sense to me to define getters on the connection, and all the *.headers keys so that those headers never escape. Although it's a slightly different implementation we have this protection in place on connection objects already. If this is easier to adopt in a non-breaking way this could be great layer to have for other users.

BUT *rant incoming 😉 * I feel like Elasticsearch-js has no business throwing around secrets all over the place. "I asked you to call Elasticsearch for me and tell me what went wrong, the authorization headers you used to make the request are irrelevant!" If I e.g. compare mongodb's nodejs client errors just contain the code, message, labels and a connection id string https://github.com/mongodb/node-mongodb-native/blob/main/src/error.ts

I know there's some backwards compatibility challenges, but if I could be completely selfish I just want an error that contains the bare minimum for me to understand what went wrong. Anything more I would rather use the diagnostics event emitter to debug. Things to include would be (probably missing something):

{
  attempts: number;
  aborted: boolean;
  body: TResponse;
  statusCode: number;
  warnings?: string[]; // not sure what this is?
  type: string;
  reason: string;
}

But definitely I don't want the request/response headers or the connection they're not useful and they're dangerous so why add it to the error object in the first place. Removing these completely makes me feel much better than trying to use a scalpel to find just the things we know are secrets.

JoshMock commented 7 months ago

Thanks for the feedback, @rudolf. I agree full removal is ideal. A diagnostic looking more like your proposal is definitely a consideration we can take for the next major version. But since the DiagnosticResult has been attached to Error objects for at least a few minor releases now, it would be a breaking change.

I'm leaning toward reverting the last commit, and changing the redaction options to something like:

interface RedactionOptions {
    type: 'basic' | 'advanced' | 'off'
    additionalKeys?: string[]
}
{
  body?: TResponse
  statusCode?: number
  warnings: string[] | null
  meta: {
    context: TContext
    name: string | symbol
    request: {
      params: {
        method: string
        path: string
      },
      options: {}
      id: any
    }
    connection: null
    attempts: number
    aborted: boolean
  }
}

By nesting the three options into their own interface, the top-level option would just be redaction: RedactionOptions, and effectively behave like one or two options instead of 3. Having type be a union of strings also allows us to add more choices later.

JoshMock commented 7 months ago

Another set of options for type could be:

JoshMock commented 7 months ago

I settled for off, replace and remove, with replace as default. I decided to skip hide, which would use the Object.defineProperty method. I don't think it's necessary, but if we see a need for it later, we can add it because the structure of the redaction options are more future-proof now.

Thanks everyone for the feedback. This solution feels a lot more refined. :pray:

JoshMock commented 7 months ago

I tested this branch against all minor release branches of elasticsearch-js back to 8.6. Glad I did. Quick fix in 5b1324df2ff3c234d2f111bbd9760207883326d5, and now all tests on those branches are passing.

Merging soon.

JoshMock commented 7 months ago

Not going to publish to npm until Monday, just to be safe.