elastic / kibana

Your window into the Elastic Stack

https://www.elastic.co/products/kibana

Other

19.57k stars 8.09k forks source link

[Meta] Audit Logging #52125

Closed jportner closed 2 years ago

jportner commented 4 years ago

Overview

The current state of audit logging in Kibana is not sufficient for many users' needs. Kibana outputs only a few types of events, without much detail, in the same transport as regular log messages. This can be improved in many ways.

Enhancements in scope:

More audit events and information regarding authentication -- e.g., log in and log out events
More audit events for accessing objects
Additional attributes for objects -- usernames, names, IPs, space IDs/names, object URLs, timestamps, authentication
Additional information to differentiate specific user sessions
Additional information to allow for correlation with Elasticsearch audit records
Ability to include/exclude certain events and attributes
Separate audit log transport with rotation capabilities
Fail-safe to stop the Kibana process if audit records cannot be written
Additional configuration to support all of the above enhancements

Current state vs. desired state...

------ ### Current state Audit records in Kibana are displayed in plaintext like so: ``` log [23:26:50.059] [info][audit][saved_objects_authorization_success][security] jdoe authorized to get config log [23:26:50.067] [info][audit][saved_objects_authorization_success][security] jdoe authorized to find index-pattern ``` If JSON output is enabled: ``` { "type": "log", "@timestamp": "2020-02-18T14:58:44-05:00", "tags": [ "info", "audit", "security", "saved_objects_authorization_success" ], "pid": 38933, "username": "jojo", "action": "get", "types": [ "config" ], "args": { "type": "config", "id": "8.0.0", "options": {} }, "eventType": "saved_objects_authorization_success", "message": "jojo authorized to get config" } { "type": "log", "@timestamp": "2020-02-18T14:58:44-05:00", "tags": [ "info", "audit", "security", "saved_objects_authorization_success" ], "pid": 38933, "username": "jojo", "action": "find", "types": [ "index-pattern" ], "args": { "options": { "perPage": 1, "page": 1, "type": [ "index-pattern" ], "search": "*", "defaultSearchOperator": "OR", "searchFields": [ "title" ], "fields": [ "title" ] } }, "eventType": "saved_objects_authorization_success", "message": "jojo authorized to find index-pattern" } ``` ### Future state Audit records should be written in a standard format ([ECS](https://www.elastic.co/guide/en/ecs/current/index.html)), should contain more information about the event that occurred and who originated the action, and fields should be configurable to include more or less information. Such an audit record would look something like this: ``` { "@timestamp": "2019-12-05T00:00:02.000Z", "event": { "action": "get config", "category": "saved_objects_authorization", "duration": 453, "end": "2019-12-05T00:00:02.453Z", "module": "security", "outcome": "success", "start": "2019-12-05T00:00:02.000Z" }, "host": { "id": "5b2de169-2785-441b-ae8c-186a1936b17d", "ip": "34.56.78.90", "hostname": "hostname" }, "http": { "request": { "body": { "bytes": 887, "content": "Hello world" }, "bytes": 1437, "method": "get", "referrer": "https://blog.example.com/" } }, "labels": { "spaceId": "default" }, "source": { "address": "12.34.56.78", "ip": "12.34.56.78" }, "url": { "domain": "www.elastic.co", "full": "https://www.elastic.co:443/search?q=elasticsearch", "path": "/search", "port": "443", "query": "q=elasticsearch", "scheme": "https" }, "user": { "email": "john.doe@company.com", "full_name": "John Doe", "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...", "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...", "name": "jdoe", "roles": [ "kibana_user" ] }, "trace": { "id": "8a4f500d" } } ``` Note: in the example above, the `user.hash` (a hash of the `user.name` field) would not be included by default; it would be an optional field that could be included if the `user.name` needed to be excluded for privacy reasons. ------

First Phase

Prerequisites (in progress):

[x] Format audit records in JSON using the Elastic Common Schema (ECS) https://github.com/elastic/kibana/issues/52226
[x] Modify Elasticsearch client to pass X-Opaque-Id header for unique events for correlation https://github.com/elastic/kibana/issues/62018
[x] Collect audit logs for ES client https://github.com/elastic/kibana/issues/60119
[x] Implement server-side sessions #17870

Phase 1 implementation: #54836

Future Phase

[x] Enriching events with session ID
[x] Support for log rotation (prerequisite: #56291)
[x] Additional attributes such as IP address (#127481) and user profile ID (#125932)
[ ] Fail-safe to stop Kibana process if audit records cannot be written https://github.com/elastic/kibana/issues/60636
[ ] Additional transport options (human-readable message formatting, multiple appenders)
[ ] Support for including/excluding event attributes
[ ] Include/exclude events based on attributes (such as saved object type)
[ ] Additional configuration to support the above

elasticmachine commented 4 years ago

Pinging @elastic/kibana-security (Team:Security)

jportner commented 4 years ago

@arisonl FYI

joshdover commented 4 years ago

I see the output format is going to be in ECS which is great. Will we support ingesting this data into Elasticsearch and using it in the product for inspection by admins? We should be able to leverage Core's logging appenders to accomplish the ingestion piece.

jportner commented 4 years ago

I see the output format is going to be in ECS which is great. Will we support ingesting this data into Elasticsearch and using it in the product for inspection by admins? We should be able to leverage Core's logging appenders to accomplish the ingestion piece.

My take on it is that the ingestion itself is out of scope for this feature. As long as we can output to JSON on the file system (which we were intending to use Core's logging appenders to do), Filebeat can be used for ingestion. Is that what you meant? Or are the logging appenders going to support ingestion directly?

joshdover commented 4 years ago

Filebeat would definitely work. It'd be interesting if we could actually ship Filebeat with Kibana configured to do this automatically. Of course there's some complexity with that as well (process monitoring, licensing, etc.)

My broader question is about whether or not there are plans to use this data in the product. For example, it'd be great if there was a menu item on an visualization that opened a UI with a history of edits to that visualization.

jportner commented 4 years ago

My broader question is about whether or not there are plans to use this data in the product. For example, it'd be great if there was a menu item on an visualization that opened a UI with a history of edits to that visualization.

In short: no. There is overlap of what information we need / what conclusions we can draw with audit logging and what we're calling "usage data". However, there is a strong separation of concerns there. We ultimately decided to keep this at a smaller scope just for the auditing use case.

I do think that once we have all of the new audit logging in place, we'll have all of the hooks/plumbing necessary to track and provide robust usage data. But we don't want to conflate audit records and usage data.

kobelb commented 4 years ago

During a Zoom meeting today, there was some discussion about which events and attributes should be in the "normal logs" vs what should be in the "audit logs". @jportner and I discussed this further and I've summarized the consensus that we reached.

The normal logs should not include user-specific information. User information is particularly sensitive, and augmenting normal log events with this information is potentially problematic. However, it's perfectly fine for these to include opaque identifiers for the session and the HTTP request. The normal logs should include all events which are logged using the standard logging infrastructure and be filtered however the user chooses.

The audit logs should include user-specific information, and controls will be put in place to only log entries for specific users or only specific user information. The audit logs will include only audit specific events. There is potentially some overlap here with regard to the events which appear in the normal logs and in the audit logs, but they're generally completely separate. The audit logs will include all authorization and authentication based events, in addition to events for specific operations of interest, including but not limited to: saved-object CRUD, Elasticsearch queries. The mechanism for creating the audit events for operations which aren't auth related needs to be explored further.

joshdover commented 4 years ago

Components needed:

Scoped contextual logger (Platform)
- Includes a unique identifier for current "context" which may be represented as a string (could be an X-Opaque-Id)
- Generate if not present on incoming request, unique per request
- Add config for only accepting X-Opaque-Id set by specific IP addresses (trusted proxies)
- Add config to source session ID from another header
- Does not include data about current user - we don't want this in OSS, security should add it itself (maybe we add a addScopeProvider API to the logger API?)
- Provided to HTTP routes via RouteHandlerContext
- Wired into RouteHandlerContext's elasticsearch client, SO client, and uisettings client
A way for the audit logging system to receive ALL logs (Platform)
- This is the most unclear part of this plan, there may be better alternatives
- Expose a "firehose" API, eg. records$(): Observable<LogRecord>
- Allow security to read and update the logging config dynamically so it could add its audit appender to all logging contexts
  - We could probably start with this and change to something else if it's a hassle
A new layout (Operations / Security)
- ECS (OSS, default JSON layout?)
- ECS (audit / extended)
  - an extension of the normal ECS layout that also logs user + event data
New audit log events (Security)
- Events can be sourced from the firehose or directly added as domain events in SO client for example
Other logging enhancements (Operations)
- Log rotation
- Elasticsearch query logs
- HTTP requests, responses
- Ops metrics logs
- etc.... (https://github.com/elastic/kibana/issues/58261)

Open questions:

Should the audit logger be a child context of the root Core logger or should it be its own instance of the same system that is configured separately?
What's the best way for the audit logger to access other events (eg. incoming requests, Elasticsearch queries, etc.)?
How can we make sure the scoped logger has the appropriate information needed for log events without exposing sensitive information to the normal logger (eg. username)?

mshustov commented 4 years ago

I see the Audit service as a separate top-level service (the outer circle in the onion architecture)

No plugins depend on the AuditTrail. AuditTrail Service may depend on any plugin. The platform and plugins emit auditable events. AuditTrail service listen to them and call plugin API to collect the necessary data.

security.on('authenticationSuccess', (message: string, request: KibanaRequest) => {
  const auditData = {
    message,
    action: 'authenticationSuccess'
    user: security.getUser(request),
    spaces: spaces.getSpace(request),
    server: core.http.getServerInfo(),
   ...
}
// has a well-known prefix
log.logger(auditData);

As an alternative, Platform provides Auditable hook and AuditTrail service registers itself via this hook.

registerAuditable(({ action: string, message: string, request: KibanaRequest }) => void): void;

To define the logging layout, we can use the same approach as elasticsearch does for SecurityAudit - add an explicit config in x-pack that enhance OSS kibana.yml config. https://github.com/elastic/elasticsearch/blob/fb86e8d6d67d95a8f2e99a175e3a6d7bbb4b196e/distribution/docker/src/docker/config/log4j2.properties#L47-L82 That would allow users to configure layout and destination as requied.

The open question for me: What type of unique data each auditable event has got? I suspect a dataset for Elasticsearch query and authentication denied events can be different. If the dataset for every auditable event is the same, we can use a common interface for Audit service. Otherwise, we might want to separate common fields from event-specific fields. AuditTrail implementation in Elasticsearch: https://github.com/elastic/elasticsearch/blob/5775ca83dbee90d3988faa611024bfaf42b13073/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/audit/logfile/LoggingAuditTrail.java

ECS (OSS, default JSON layout?)

Elasticsearch doesn't use the ECS type. Instead, their JSON layout follows the ECS format by default.

Does not include data about current user - we don't want this in OSS, security should add it itself (maybe we add a addScopeProvider API to the logger API?)

We already have RequestHandlerContext. It might expose addMetaData() to extend request with additional data. If we consider some data as sensitive, we shouldn't provide read access to it. The main problem with this approach that the AuditTrail plugin hasn't got control over the shape of data, but it needs to filter and to format them to a necessary layout (that differentiate it from telemetry plugin approach)

joshdover commented 4 years ago

I think we are largely on the same page here. I'd like to layout this plan with a distinction between some of the concerns. Namely, I'd like to separate what is necessary to support general observability and tracing within Kibana logs (OSS and otherwise) and what is necessary to support audit logs (X-Pack).

General observability requirements:

Be able to find all log messages that occur during an HTTP request
- This includes Elasticsearch logs
Be able to configure log layouts and appenders to output this data
Support ECS in JSON layout

Audit logging requirements:

Be able to trace key domain actions within Kibana
- Examples: saved objects access events, authentication events, etc.
Be able to correlate domain action events with regular request logs

For the general observability case, we need a couple new components:

Contextual data on log records that includes information about the request that initiated the log
An ECS-compatible JSON log layout

I think we're both in agreement on how to accomplish these two requirements.

(1) can be solved by introducing a formal "LogContext` struct that is used by both the Logger and the Elasticsearch and SavedObjects clients. This struct would be created by Core's request context provider and injected into the ES and SO clients exposed by RequestHandlerContext. This enables every log message in those clients to include data about the current request (would not include user data).

(2) is solved by changing our JSON log layout to be ECS-compatible.

For the audit logging case, we need:

A way to produce domain action events, a few options:
- Specific emit points in the OSS code;
- Leveraging the existing logs and translating them to domain events; or
- A registerAuditable interface as above
A way in which domain actions can be mapped back to additional context information not included in OSS
- For most cases this is being able to call security.authc.getCurrentUser with the KibanaRequest object.

(1) is where I think we need some discussion.

My only concern about adding domain-specific events is that they may be abused by other plugins for different purposes. For example, we've gotten requests to add hooks like onDelete to the SavedObjectsClient. Having generic hooks like this can lead to a complex web of business logic that relies on these hooks executing in order to keep the system in a valid state.

I think we just need to take care in how we implement such events so that the timing of when they are executed is not depended on by business logic. In other words, I want to avoid a situation where an app is dependent on these hooks in order to function correctly (other than audit logging itself).

This makes me lean slightly towards the registerAuditable interface or something similar. I think it's much more likely that these type of events are consumed responsibly if they are exposed this way, rather than each sub-system emitting these events.

legrego commented 4 years ago

Sorry for being dense, I'm a bit confused about the proposed use of registerAuditable. @restrry's example makes me think that it would be used by the service responsible for decorating and writing the already-generated audit log events to disk, but @joshdover's comment leads me to believe that it would be used to produce the domain action events that eventually get decorated and logged downstream.

Can we outline what a couple of domain action event might look like? Let's say that both Security and Spaces are enabled:

View diagram markup

title Create Dashboard

User->Kibana: Create Dashboard Request
Kibana->Kibana: Unique Request identifier created
Kibana->Saved Objects Service: Create Dashboard Request
Saved Objects Service->Security SOC Wrapper: AuthN Check
Security SOC Wrapper->ES: _has_privileges request
ES->Security SOC Wrapper: _has_privileges response
Security SOC Wrapper->Saved Objects Service: OK
Saved Objects Service->ES: index { bigBlobOfJSON }
ES->Saved Objects Service: { bigBlobOfJSON }
Saved Objects Service->Kibana: { bigBlobOfJSON }
Kibana->User: Create Dashboard Response

In this example, we have 2 requests made to ES: one for the privileges check, and another to actually index the saved object. In this example, I'd expect a single "Create dashboard" audit record, as the privileges check is a simple implementation detail, which would still be captured by the ES audit logs.

What about a more complex example though? Consider the "Copy to space" feature. This works by first performing a server-side export, followed by a server-side import:

View diagram markup

title Copy to Space

User->Kibana: Copy to space Request
Kibana->Kibana: Unique Request identifier created
Kibana->Saved Objects Service: bulk_get objects to be copied
Saved Objects Service->Security SOC Wrapper: AuthN Check
Security SOC Wrapper->ES: _has_privileges request
ES->Security SOC Wrapper: _has_privileges response
Security SOC Wrapper->Saved Objects Service: OK
Saved Objects Service->ES: bulk_get [{type: 'search', id: 'foo'}, ...]
ES->Saved Objects Service: bulk_get response [{ bigBlobOfJSON }]
Saved Objects Service->Kibana: [{ bigBlobOfJSON }]
Kibana->Saved Objects Service: bulk_create objects to be copied
Saved Objects Service->Security SOC Wrapper: AuthN Check
Security SOC Wrapper->ES: _has_privileges request
ES->Security SOC Wrapper: _has_privileges response
Security SOC Wrapper->Saved Objects Service: OK
Saved Objects Service->ES: bulk_create [{type: 'search', id: 'foo'}, ...]
ES->Saved Objects Service: bulk_create response [{ bigBlobOfJSON }]
Saved Objects Service->Kibana: [{ bigBlobOfJSON }]
Kibana->User: Copy to space Response

How many audit records would we expect to see here? Somewhere between 1 and 3? 1) "Copy to space" record 2) "Export/ bulk_get saved objects" record 3) "Import / bulk_create saved objects" record

My initial reaction is that 2 and 3 are implementation details of 1, and therefore might not make sense in the audit log. They should show up in the general log, however. Someone trying to understand the audit logs might get confused that they're seeing bulk_get and bulk_create requests, when they in fact "only" performed a Copy to space action.

To make a comparison to the ES audit logs, I don't think they record shard read/writes that occur as part of a user's request. They log that the request happened, and the "implementation details" are kept out of the audit logs.

I only bring this up because it's not immediately clear to me where we'll choose to generate/emit these audit events. Doing so at the saved objects client would cause these "implementation details" to be logged for various domain action events. Emitting from the http routes (the public API) would probably get us most of the way there, but that doesn't handle actions like background jobs.

mshustov commented 4 years ago

In this example, we have 2 requests made to ES: one for the privileges check, and another to actually index the saved object. In this example, I'd expect a single "Create dashboard" audit record, as the privileges check is a simple implementation detail, which would still be captured by the ES audit logs.

I'd expect to see Dashboard created and Dashboard creation failed audit records in this example. Both should provide additional info: who performs an action, in what space, etc.

How many audit records would we expect to see here? Somewhere between 1 and 3?

The same logic here. I expect the only one event here - Copied to space. Users do not think in terms of Export/ bulk_get saved objects / Import / bulk_create saved objects. As you said, they are implementation details. However, users can find correlated low-level events in the Kibana logs via a request identifier / a background task identifier.

I only bring this up because it's not immediately clear to me where we'll choose to generate/emit these audit events.

The Infrastructure level (ES / SO clients) cannot emit domain events. A plugin code emits them. Depending on the plugin workflow, it can be done:

in an HTTP route handler
in background task runner

I proposed to use Audit Trail service that receives those domain events and calculates data for to build Audit Logging Record:

// in plugin code
auditTrail.add({event, message, request});
// in http request handler context can be bound to a request
auditTrail.add({event, message});
// in background task we haven't got a context pattern and might have to introduce one
auditTrail.add({event, message});

// in audit trail plugin code
class AuditTrail {
  on(event, message, request){
    const auditData = {
      message,
      action: 'authenticationSuccess'
      user: security.getUser(request),
      spaces: spaces.getSpace(request),
      server: core.http.getServerInfo(),
     ...
  }
  // has a well-known prefix
  log.logger(auditData);
}

Audit Logger doesn't deal with any observability concerns (ES query performance, for example).

Let me know if it makes sense to you or if I missed something.

legrego commented 4 years ago

That all makes sense, thanks. My primary question was how we would allow plugin code to emit events. Something like auditTrail.add({event, message, request}) makes perfect sense to me.

My initial confusion was around registerAuditable, and then I got distracted with those two examples I put up. So registerAuditable would be a hook provided by core, which the security plugin (for example) could call in order to be notified about all emitted audit events? Similar to how core provides a hook for security to register the auth provider?

mshustov commented 4 years ago

So registerAuditable would be a hook provided by core, which the security plugin (for example) could call in order to be notified about all emitted audit events?

I'd expect it to be used by AuditTrail plugin to extend the platform. There are several benefits of using it in this manner:

the Audit API can be used for OSS code (ES, SO CRUD)
the Audit API can be used for audit plugin dependencies (security, spaces)

AuditTrail plugin can depend on any plugin and uses plugin public API to calculate audit data:

// package.json
requiredPlugins: ['security', 'spaces'],
// plugin.ts
class AuditTrail {
  on(event, message, request){
    const auditData = {
      message,
      action: 'authenticationSuccess'
      user: security.getUser(request),
      spaces: spaces.getSpace(request),
      server: core.http.getServerInfo(),
     ...
  }
  // has a well-known prefix
  log.logger(auditData);
}

platform.registerAuditable(auditTrail.on)

Probably registerAuditable is not the best name. Is registerAuditor more clear?

Also, I'd like to hear from Josh. He might have a different vision.

jportner commented 4 years ago

Good idea making a diagram @legrego --

OK, so Approach #1 as described above is to generate a single audit event for each user request.

How many audit records would we expect to see here? Somewhere between 1 and 3?

"Copy to space" record

"Export/ bulk_get saved objects" record

"Import / bulk_create saved objects" record

In Approach #2 that I've been thinking of, we would see five audit records:

In my mind it would look something like this.

Click to see JSON

``` { "event": { "action": "read sourcespace dashboard", "category": "saved_objects_authorization", "module": "plugin:security", "outcome": "success", }, "trace": { "id": "some-uuid" } } { "event": { "action": "bulk_get [sourcespace:dashboard:foo]", "category": "saved_objects_client", "module": "core", "outcome": "success", }, "trace": { "id": "some-uuid" } } { "event": { "action": "write destspace dashboard", "category": "saved_objects_authorization", "module": "plugin:security", "outcome": "success", }, "trace": { "id": "some-uuid" } } { "event": { "action": "bulk_create [destspace:dashboard:foo]", "category": "saved_objects_client", "module": "core", "outcome": "success", }, "trace": { "id": "some-uuid" } } { "event": { "action": "POST", "category": "http", "module": "core", "outcome": "success", }, "http": { "request": { "body": { "content": "{\"objects\":[{\"type\":\"dashboard\",\"id\":\"foo\"}],\"spaces\":[\"destspace\"],\"includeReferences\":true,\"overwrite\":true}" }, "method": "POST" } }, "source": { "address": "12.34.56.78", "ip": "12.34.56.78" }, "url": { "domain": "www.somekibanahost.com", "full": "https://www.somekibanahost.com/api/spaces/_copy_saved_objects", "path": "/api/spaces/_copy_saved_objects", "port": "443", "query": "", "scheme": "https" }, "user": { "email": "john.doe@company.com", "full_name": "John Doe", "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...", "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...", "name": "jdoe", "roles": [ "kibana_user" ] }, "trace": { "id": "some-uuid" } } ``` _Note 1: I omitted some attributes in the interest of brevity._ _Note 2: each record can be correlated with each other by trace.id (which should also be sent to Elasticsearch as `X-Opaque-Id`)._ _Note 3: the four records in the audit trail with the `saved_objects_client` and `saved_objects_authorization` categories wouldn't need to contain **all** of the attributes (`http`, `source`, `url`, `user`) -- however, the "events" that these records are generated from would still need to have this info. This is because we want to be able to add a filter to avoid writing records based on certain attributes, such as user or IP address._

So, this approach would generically audit all API routes and SOC calls. It would show what's happening "under the hood" for the SOC and its wrappers. Of course this is more verbose than the alternative of writing a single audit event for each request.

Potential advantages of Approach #2:

Less work to get broad auditing coverage of Kibana -- we don't need each plugin to be responsible for auditing events for all of its APIs. If an API relies on the SOC then it shouldn't need to audit anything else. From what I understand, this covers the majority of Kibana APIs. For any other APIs that don't rely on the SOC, we can add specific audit events.
- Fewer blind spots -- any new plugin would automatically get covered by these generic audit events.
- No need for us to try to understand "failure" event nuances (such as authZ failed for read, or authZ failed for write) at the API route level -- records that were generated at lower layers would include that info.
Easier to reason about the chain of events that happen when Kibana interacts with Elasticsearch on behalf of a user.
Easier to track down who did what (e.g., search for "create dashboard:foo" -- that would be revealed in this audit trail because of the bulk_create event that was logged)

Disadvantages:

More verbose (though we plan to offer the ability to filter out events by action or category if so desired)
Additional events may not always have much meaning

Thoughts?

joshdover commented 4 years ago

Probably registerAuditable is not the best name. Is registerAuditor more clear?

Also, I'd like to hear from Josh. He might have a different vision.

I think we're on the same page here. The only part I'm confused about in your example is the auditTrail.add API. This is meant to be a Core API, right? Not an API on the audit trail plugin.

If we're on the same page there, then the final result is Platform would need to expose two APIs:

registerAuditor for receiving audit events
- This is the API that audit log plugin would use to get all events, enrich with additional data, and forward to a logger.
auditTrail.add / addAuditEvent / someOtherName for adding audit events
- This is the API that Core, OSS plugins, and commercial plugins would use to add domain-events for user actions (eg. Copy to Space). These events are forwarded to any auditors registered with registerAuditor.

In terms of what produces the audit events themselves (@jportner's discussion above), I think I do favor Approach #2 for its completeness. It seems less likely that we may miss an critical event that should be included in the audit log if we log the lower level details. That said, I'm not very familiar with how audit logs are used by customers. If the low-level logs are too opaque to understand, that could make these logs much less useful.

So really it seems the question is: do we favor completeness or clearer semantics?

Could we do both? Could the semantic, high-level action be provided as a "scope" for the lower-level audit events?

For example, what if we had an API that allows an HTTP endpoint to start a auditable event scope so that all audit events that are produced while that scope is open are associated with the high-level semantic action.

router.post(
  { path: '/api/do_action' },
  async (context, req, res) => {
    const auditScope = context.audit.openScope('copy_to_space');
    try {
      // Any audit events produced by SO client while scope is open 
      // would be associated with the `copy_to_space` scope.
      const res = await copyToSpace(context.savedObjects.client);
      return res.ok({ body: res });
    } finally {
      auditScope.close();
    }
  }
);

Or we could change the API a bit to:

router.post(
  { path: '/api/do_action' },
  async (context, req, res) => context.audit.openScope(
    'copy_to_space',
    async () => {
      // Any audit events produced by SO client while scope is open 
      // would be associated with the `copy_to_space` scope.
      const res = await copyToSpace(context.savedObjects.client);
      return res.ok({ body: res });
    }
  )
);

The tricky part about this in Node.js is that these async actions are running in the same memory space, which makes associating the scope with any asynchronous code difficult. Couple options for solving:

Bind the context.audit object for the request to the ES and SO clients provided by context.
Use the continuation-local-storage library to associate any code executed in the promise chain with the scope. This would eliminate any problem with plugins that use their own ES client or SO repository not being associated with the scope. However, I don't have experience with this library and it may be a premature optmization.

mshustov commented 4 years ago

If we're on the same page there, then the final result is Platform would need to expose two APIs: registerAuditor for receiving audit events This is the API that audit log plugin would use to get all events, enrich with additional data, and forward to a logger. auditTrail.add / addAuditEvent / someOtherName for adding audit events This is the API that Core, OSS plugins, and commercial plugins would use to add domain-events for user actions (eg. Copy to Space). These events are forwarded to any auditors registered with registerAuditor.

Correct 👍

The tricky part about this in Node.js is that these async actions are running in the same memory space, which makes associating the scope with any asynchronous code difficult.

AFAIK Nodejs provides built-in primitives that we can try to use for this case https://nodejs.org/api/async_hooks.html it's time to finally watch https://www.youtube.com/watch?v=omOtwqffhck from @watson 😄

joshdover commented 4 years ago

AFAIK Nodejs provides built-in primitives that we can try to use for this case nodejs.org/api/async_hooks.html

I agree async_hooks could be a solution. My concern is just that it's still in experimental, even in the latest Node version. It does look like the working group is discussing stabilization. If it does go stable in v14 LTS, it could be a viable option for us.

thomheymann commented 4 years ago

Hi team, I'm new to the project and am starting to get up to speed with the audit log feature.

From speaking to different people there still seem to be a few outstanding questions and different ideas as to what the audit log should provide, to what level of detail and how it differs from existing logging.

In order to help us define a clear approach I wanted to define some guiding principles that we can agree on and then refer back to when making a decision about whether something should be included in the audit log or not and what the implementation should look like.

I have written these as statements but they are all open questions / up for debate.

I might have gotten this completely wrong so would be great to get your thoughts!

Guiding Principles

What’s the difference between our audit log and system log?

The purpose of an audit log is to support compliance, accountability and security by capturing who performed an action, what action was performed, when it occurred and what the outcome was
It is not the purpose of an audit log to aid with debugging the system or provide usage statistics

What events need to be captured?

Auditing requirements will vary widely between organisations so we will allow fine grained control over what gets captured with sensible defaults
At the most verbose level we should allow capturing all events that fall in the following categories:
- System access (incl. failed attempts)
- Data reads (incl. failed attempts)
- Data writes (incl. failed attempts)
Filters can then be applied to e.g. only log data mutations or failed attempts

When are events logged?

Audit logs have knowledge of the outcome of an event so will be captured after an operation completed

Can an action trigger multiple events (log lines)?

Actions can be a combinations of different operations each of which need to be captured as separate events
Multiple events that were part of the same request can be correlated in the audit log using the trace id property
A bulk operation should be logged as a single log line with meta data (e.g. which saved objects where accessed) extracted to simplify search / aggregation

How does Kibana audit logging tie into ElasticSearch audit logging?

Kibana should provide a full picture regarding what saved objects were accessed by whom since ElasticSearch has no context over Kibana session / user details
Kibana will not capture results from queries against users' data indices (Responsibility of ElasticSearch)
Audit logs of both systems can be correlated using X-Opaque-Id header

Examples

Do log when a user logs in or out of Kibana
Do log when any saved object was accessed / written to
Do not log when user data indices / records were accessed
Do not log Kibana implementation details (i.e. if Kibana needs to make certain checks internally but the user has no way of seeing that data in the UI or API response then that should not be logged in the audit log)

thomheymann commented 4 years ago

ECS Audit Log Proposal

Field Reference: https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html

Approach

Authorisation / privilege checks are logged as an outcome of an action rather than as a separate log line since they are implementation details. This is the same approach as error/success results in ECS standard.

Bulk operations are logged as separate events. It would be less verbose to combine a bulk operation into a single log line but that would mean that we can't record successes/failures individually using ECS standard. Saved object details are extracted into a non-standard document field for each audit event.

category, type and outcome fields are categorisation fields in ECS with specific allowed keywords. I tried to map these as good as I can but some of them do sound slightly clunky for our use case.

Events

User Authentication

{
  "message": "User 'jdoe' logged in successfully using realm 'native'|Failed login attempt using realm 'native'|User re-authentication failed",
  "event": {
    "action": "user_login|user_logout|user_reauth",
    "category": ["authentication"],
    "type": ["user"],
    "outcome": "success|failure",
    "module": "kibana",
    "dataset": "kibana.audit"
  },
  "error": {
    "code": "spaces_authorization_failure",
    "message": "jdoe unauthorized to getAll spaces",
  },
  "trace": {
    "id": "opaque-id"
  }
}

Saved Object CRUD

{
  "message": "User 'jdoe' created dashboard 'new-saved-object' in space 'default'",
  "event": {
    "action": "saved_object_create",
    "category": ["database"],
    "type": ["creation|access|change|deletion", "allowed|denied"],
    "outcome": "success|failure",
  },
  "document": {
    "space": "default",
    "type": "dashboard",
    "id": "new-saved-object"
  },
  "error": {
    "code": "spaces_authorization_failure",
    "message": "jdoe unauthorized to getAll spaces",
  },
  "trace": {
    "id": "opaque-id"
  }
}

HTTP Response

{
  "message": "HTTP request 'login' by user 'jdoe' succeeded",
  "event": {
    "action": "http_request",
    "category": ["web"],
    "outcome": "success|failure",
  },
  "http": {
    "request": {
      "method": "POST",
      "body": {
        "content": "{\"objects\":[{\"type\":\"dashboard\",\"id\":\"foo\"}],\"spaces\":[\"destspace\"],\"includeReferences\":true,\"overwrite\":true}"
      }
    },
    "response": {
      "status_code": 200
    }
  },
  "source": {
    "address": "12.34.56.78",
    "ip": "12.34.56.78"
  },
  "url": {
    "domain": "kibana",
    "full": "https://kibana/api/spaces/_copy_saved_objects",
    "path": "/api/spaces/_copy_saved_objects",
    "port": "443",
    "query": "",
    "scheme": "https"
  },
  "user": {
    "email": "john.doe@company.com",
    "full_name": "John Doe",
    "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...",
    "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...",
    "name": "jdoe",
    "roles": [ "kibana_user" ]
  },
  "trace": {
    "id": "opaque-id"
  }
}

Scenarios

Copy to space

{
  "message": "User 'jdoe' accessed dashboard 'first-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
  "document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
  "message": "User 'jdoe' accessed dashboard 'second-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
  "document": { "id": "second-object", "type": "dashboard", "space": "default" }
}
{
  "message": "User 'jdoe' created dashboard 'first-object' in space 'copy'",
  "event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
  "document": { "id": "first-object", "type": "dashboard", "space": "copy" }
}
{
  "message": "User 'jdoe' created dashboard 'second-object' in space 'copy'",
  "event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
  "document": { "id": "second-object", "type": "dashboard", "space": "copy" }
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' succeeded",
  "event": { "action": "http_request", "category": ["web"], "outcome": "success" }
}

Error: User not authorised to access dashboard (Kibana authZ):

{
  "message": "User 'jdoe' not authorised to access dashboard 'first-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "failure" },
  "error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" },
  "document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
  "event": { "action": "http_request", "category": ["web"], "outcome": "failure" },
  "error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" }
}

Error: Session expired (Kibana authN):

{
  "message": "Unknown user not authenticated to request 'copy-to-space'",
  "event": { "action": "http_request", "category": ["web", "authentication"], "type": ["denied"], "outcome": "failure" }
}

Error: User not authorised to access data index (ElasticSearch authZ):

{
  "message": "User 'jdoe' not authorised to access index 'products'"
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
  "event": { "action": "http_request", "category": ["web", "authentication"], "type": ["allowed"], "outcome": "failure" }
}

User login

{
  "message": "User 'jdoe' logged in successfully using realm 'native'",
  "event": { "action": "user_login", "category": ["authentication"], "type": ["user"], "outcome": "success" }
}
{
  "message": "HTTP request 'login' by user 'jdoe' succeeded",
  "event": { "action": "http_request", "category": ["web"], "outcome": "success" }
}

Open question

How does generic API request logging (http_request) tie into the other audit events. For bulk operation these make sense as it groups the other events together. For single operation requests it feels like unnecessary duplication. (See user_login example)
Implementation constraints: Do we have all data available we need at the point of logging?

mshustov commented 4 years ago

@thomheymann thank you for the logging format proposal. I have a couple of questions about the Events section.

Is it the complete list of events for the first stage of Audit Logging? Or it's just the list for the First phase / just an example of sub-set of events.
What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.
Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

thomheymann commented 4 years ago

Thanks for feedback Mikhail!

Is it the complete list of events for the first stage of Audit Logging? Or it's just the list for the First phase / just an example of sub-set of events.

These are only example events, there are a lot more events we would audit but I wanted to establish some kind of a pattern first since most of the other events would follow a similar approach. I've added a list of the possible other events below. (again, not complete / reviewed)

What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.

The way I understood HTTP based audit logging is that it's a way of very quickly and easily getting most of our auditing requirements ticked off without forcing plugin authors to manually create audit specific events. It feeds into one of my open questions though around the overlap of these (i.e. do we need an http_request event for the login route in our audit log if we already log user logins as a separate event?)

Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

I have no view on this at this point, I'm purely looking at it from a requirements perspective. Would be great to get a steer in terms of what is actually feasible based on the implementation.

legrego commented 4 years ago

Thanks for the writeup @thomheymann! A quick note on your guiding principles:

Do log what indices / records were accessed

When discussing how this ties into ES audit logs, you menion:

Maybe record level audit logging could be left to ElasticSearch?

I agree with this. I wouldn't expect Kibana to log responses returned by ES that result from queries against users' data indices.

The full list of events might be easier to curate and discuss in a google doc. Entries under user and role management should be left to ES audit logs, as they are the authoritative source of this information. I expect logstash pipelines fall into this category as well.

What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.

The way I understood HTTP based audit logging is that it's a way of very quickly and easily getting most of our auditing requirements ticked off without forcing plugin authors to manually create audit specific events. It feeds into one of my open questions though around the overlap of these (i.e. do we need an http_request event for the login route in our audit log if we already log user logins as a separate event?)

At the most verbose level, we may want to include everything, or almost everything here. The ability to filter this out will be critical though, and it'll probably make sense to come up with a sensible configuration so that we don't log everything by default, but instead allow administrators to opt-in to more granularity.

Perhaps the platform could add a route option to the interface to allow a route to exclude itself from auditing, if we find that we need this flexibility.

Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

I have no view on this at this point, I'm purely looking at it from a requirements perspective. Would be great to get a steer in terms of what is actually feasible based on the implementation.

I'm leaning towards having the security plugin log these events (it's what we do today). It's technically possible to create a SOC without the security wrapper applied, but in those cases, we'd expect consumers to audit their own SO events. Alerting is one such example: https://github.com/gmmorris/kibana/blob/alerting/consumer-based-rbac/x-pack/plugins/alerts/server/authorization/alerts_authorization.ts#L158

legrego commented 4 years ago

Bulk operations are logged as separate events. It would be less verbose to combine a bulk operation into a single log line but that would mean that we can't record successes/failures individually using ECS standard. Saved object details are extracted into a non-standard document field for each audit event.

There might be an exception to this that I'm overlooking, but I believe all bulk operations are all-or-nothing today, so we don't have a need for logging success/failures individually. Our current approach (which isn't necessarily the right one) is to log bulk operations as a single entry, but that entry identifies the objects in question. Verbosity aside, I worry about the performance of logging bulk operations as separate events. An export of 10,000 saved objects would require approximately 10,000 audit log entries, which could take a non-trivial amount of time.

How does generic API request logging (http_request) tie into the other audit events. For bulk operation these make sense as it groups the other events together. For single operation requests it feels like unnecessary duplication. (See user_login example)

It might be unnecessary duplication, but I think it's hard to definitively say that a certain API endpoint will only ever do a single operation. We could attempt to tag routes as such, but that requires manual effort on the engineering side which could be easily overlooked during a seemingly unrelated refactor. At the moment, I'm thinking we'll accept the duplication since we'll have the ability to filter events, but we can always revisit this if we find a clear pattern to these events

I'm interested in hearing other thoughts though! My opinions here are just that.

legrego commented 2 years ago

Closing this meta issue, as we have sub-issues open to track the remaining individual tasks that we care about at this time.

mbudge commented 8 months ago

Please can you add the saved object name/description so we can provide reports to IT controls?

Reports with the saved object ID aren't user friendly.

legrego commented 8 months ago

@mbudge your request is being tracked here: https://github.com/elastic/kibana/issues/100523. edit: I see you discovered this already