[Okta] Add a `okta.debug_context.debug_data` keyword field

terrancedejesus commented 2 months ago

Summary

Problem

At this time, the Okta integration that ingests system logs has a field labeled okta.debug_context.debug_data.flattened. Flattened field types are currently unsupported in ES|QL, therefore detection rule authors are unable to use the context in this field for rules or hunting.

Example JSON

``` { "_index": ".ds-logs-okta.system-default-2024.05.27-000003", "_id": "3a0ee229-1f16-11ef-ad0c-1ddcb8547577", "_version": 1, "_score": 0, "_source": { "agent": { "name": "dejesus-okta-research", "id": "c7c536d0-4b32-40b3-92e7-c7cbf0944339", "type": "filebeat", "ephemeral_id": "a1610e64-46ba-451b-a302-b63c1091b2ca", "version": "8.12.2" }, "elastic_agent": { "id": "c7c536d0-4b32-40b3-92e7-c7cbf0944339", "version": "8.12.2", "snapshot": false }, "source": { "geo": { "continent_name": "Asia", "region_iso_code": "IN-KA", "city_name": "Bengaluru", "country_iso_code": "IN", "country_name": "India", "location": { "lon": 1, "lat": 1 }, "region_name": "Karnataka" }, "as": { "number": 55836, "organization": { "name": "Reliance Jio Infocomm Limited" } }, "ip": "redacted", "domain": ".", "user": { "full_name": "redacted", "name": "redacted", "id": "redacted" } }, "tags": [ "forwarded", "okta-system" ], "cloud": { "availability_zone": "us-east1-b", "instance": { "name": "dejesus-okta-research", "id": "4161709401838773778" }, "provider": "gcp", "service": { "name": "GCE" }, "machine": { "type": "e2-medium" }, "project": { "id": "elastic-security-dev" }, "region": "us-east1", "account": { "id": "elastic-security-dev" } }, "input": { "type": "httpjson" }, "@timestamp": "2024-05-31T06:22:59.557Z", "ecs": { "version": "8.11.0" }, "related": { "ip": [ "redacted" ], "user": [ "redacted", "redacted" ] }, "data_stream": { "namespace": "default", "type": "logs", "dataset": "okta.system" }, "client": { "geo": { "city_name": "Bengaluru", "country_name": "India", "location": { "lon": 1, "lat": 1 }, "region_name": "Karnataka" }, "as": { "organization": { "name": "reliance jio infocomm limited" } }, "ip": "redacted", "domain": ".", "user": { "full_name": "redacted", "name": "redacted", "id": "redacted" } }, "event": { "agent_id_status": "verified", "ingested": "2024-05-31T06:24:37Z", "created": "2024-05-31T06:24:27.401Z", "kind": "event", "action": "app.oauth2.token.grant.access_token", "id": "3a0ee229-1f16-11ef-ad0c-1ddcb8547577", "dataset": "okta.system", "outcome": "success" }, "okta": { "actor": { "id": "redacted", "display_name": "redacted", "type": "PublicClientApp", "alternate_id": "redacted" }, "request": { "ip_chain": [ { "geographical_context": { "country": "India", "city": "Bengaluru", "state": "Karnataka", "postal_code": "redacted", "geolocation": { "lon": 1, "lat": 1 } }, "ip": "redacted", "version": "V4" } ] }, "debug_context": { "debug_data": { "flattened": { "clientAuthType": "client_secret_post", "grantedScopes": "okta.logs.read", "requestId": "76094a4ec67ae862a88c9d274b2353c9", "responseTime": "269", "dtHash": "redacted", "clientSecret": "E5NMtFDu1xVWq6Stx_AlRA", "requestUri": "/oauth2/v1/token", "requestedScopes": "okta.logs.read", "threatSuspected": "false", "grantType": "client_credentials", "url": "/oauth2/v1/token?" }, "dt_hash": "redacted", "threat_suspected": "false", "request_id": "76094a4ec67ae862a88c9d274b2353c9", "request_uri": "/oauth2/v1/token", "url": "/oauth2/v1/token?" } }, "event_type": "app.oauth2.token.grant.access_token", "authentication_context": { "authentication_step": 0, "external_session_id": "unknown" }, "display_message": "OIDC access token is granted", "client": { "zone": "null", "ip": "redacted", "id": "redacted", "device": "Unknown", "user_agent": { "raw_user_agent": "PostmanRuntime/7.39.0", "os": "Unknown", "browser": "UNKNOWN" } }, "uuid": "3a0ee229-1f16-11ef-ad0c-1ddcb8547577", "outcome": { "result": "SUCCESS" }, "transaction": { "id": "76094a4ec67ae862a88c9d274b2353c9", "type": "WEB" }, "security_context": { "as": { "number": 55836, "organization": { "name": "reliance jio infocomm limited" } }, "domain": ".", "isp": "reliance jio infocomm limited", "is_proxy": false }, "target": [ { "id": "redacted", "type": "access_token", "display_name": "Access Token", "alternate_id": null } ] }, "user": { "full_name": "redacted", "name": "redacted" }, "user_agent": { "original": "PostmanRuntime/7.39.0", "name": "Other", "device": { "name": "Other" } } }, "fields": { "okta.client.device": [ "Unknown" ], "elastic_agent.version": [ "8.12.2" ], "user_agent.original.text": [ "PostmanRuntime/7.39.0" ], "okta.client.ip": [ "redacted" ], "okta.client.user_agent.os": [ "Unknown" ], "okta.security_context.as.number": [ 55836 ], "source.user.name.text": [ "redacted" ], "source.geo.region_name": [ "Karnataka" ], "user.full_name.text": [ "redacted" ], "source.ip": [ "redacted" ], "agent.name": [ "dejesus-okta-research" ], "event.agent_id_status": [ "verified" ], "event.outcome": [ "success" ], "source.geo.city_name": [ "Bengaluru" ], "user_agent.original": [ "PostmanRuntime/7.39.0" ], "okta.uuid": [ "3a0ee229-1f16-11ef-ad0c-1ddcb8547577" ], "cloud.region": [ "us-east1" ], "source.user.full_name.text": [ "redacted" ], "input.type": [ "httpjson" ], "okta.authentication_context.authentication_step": [ 0 ], "related.user": [ "redacted", "redacted" ], "tags": [ "forwarded", "okta-system" ], "okta.client.zone": [ "null" ], "cloud.machine.type": [ "e2-medium" ], "cloud.provider": [ "gcp" ], "agent.id": [ "c7c536d0-4b32-40b3-92e7-c7cbf0944339" ], "client.user.name": [ "redacted" ], "source.as.number": [ 55836 ], "okta.authentication_context.external_session_id": [ "unknown" ], "client.user.name.text": [ "redacted" ], "user.name": [ "redacted" ], "source.domain": [ "." ], "cloud.instance.id": [ "4161709401838773778" ], "okta.security_context.is_proxy": [ false ], "agent.type": [ "filebeat" ], "client.geo.region_name": [ "Karnataka" ], "okta.actor.type": [ "PublicClientApp" ], "related.ip": [ "redacted" ], "elastic_agent.snapshot": [ false ], "okta.client.user_agent.raw_user_agent": [ "PostmanRuntime/7.39.0" ], "client.domain": [ "." ], "elastic_agent.id": [ "c7c536d0-4b32-40b3-92e7-c7cbf0944339" ], "okta.debug_context.debug_data.url": [ "/oauth2/v1/token?" ], "okta.actor.display_name": [ "redacted" ], "okta.client.id": [ "redacted" ], "event.action": [ "app.oauth2.token.grant.access_token" ], "event.ingested": [ "2024-05-31T06:24:37.000Z" ], "@timestamp": [ "2024-05-31T06:22:59.557Z" ], "cloud.account.id": [ "elastic-security-dev" ], "data_stream.dataset": [ "okta.system" ], "agent.ephemeral_id": [ "a1610e64-46ba-451b-a302-b63c1091b2ca" ], "event.id": [ "3a0ee229-1f16-11ef-ad0c-1ddcb8547577" ], "user_agent.device.name": [ "Other" ], "cloud.instance.name": [ "dejesus-okta-research" ], "cloud.project.id": [ "elastic-security-dev" ], "user.name.text": [ "redacted" ], "okta.outcome.result": [ "SUCCESS" ], "okta.security_context.isp": [ "reliance jio infocomm limited" ], "cloud.availability_zone": [ "us-east1-b" ], "okta.debug_context.debug_data.request_uri": [ "/oauth2/v1/token" ], "okta.display_message": [ "OIDC access token is granted" ], "client.user.full_name": [ "redacted" ], "client.as.organization.name": [ "reliance jio infocomm limited" ], "okta.actor.alternate_id": [ "redacted" ], "client.geo.country_name": [ "India" ], "source.geo.region_iso_code": [ "IN-KA" ], "client.as.organization.name.text": [ "reliance jio infocomm limited" ], "event.kind": [ "event" ], "okta.debug_context.debug_data.flattened": [ { "clientAuthType": "client_secret_post", "grantedScopes": "okta.logs.read", "requestId": "76094a4ec67ae862a88c9d274b2353c9", "responseTime": "269", "dtHash": "redacted", "clientSecret": "E5NMtFDu1xVWq6Stx_AlRA", "requestUri": "/oauth2/v1/token", "requestedScopes": "okta.logs.read", "threatSuspected": "false", "grantType": "client_credentials", "url": "/oauth2/v1/token?" } ], "client.user.id": [ "redacted" ], "okta.security_context.domain": [ "." ], "client.ip": [ "redacted" ], "user_agent.name": [ "Other" ], "okta.client.user_agent.browser": [ "UNKNOWN" ], "data_stream.type": [ "logs" ], "okta.request.ip_chain": [ { "geographical_context": { "country": "India", "city": "Bengaluru", "state": "Karnataka", "postal_code": "redacted", "geolocation": { "lon": 1, "lat": 1 } }, "ip": "redacted", "version": "V4" } ], "okta.debug_context.debug_data.dt_hash": [ "redacted" ], "okta.transaction.id": [ "76094a4ec67ae862a88c9d274b2353c9" ], "cloud.service.name": [ "GCE" ], "ecs.version": [ "8.11.0" ], "event.created": [ "2024-05-31T06:24:27.401Z" ], "user.full_name": [ "redacted" ], "agent.version": [ "8.12.2" ], "source.user.name": [ "redacted" ], "okta.debug_context.debug_data.request_id": [ "76094a4ec67ae862a88c9d274b2353c9" ], "source.user.full_name": [ "redacted" ], "source.geo.location": [ { "coordinates": [ 1, 1 ], "type": "Point" } ], "okta.event_type": [ "app.oauth2.token.grant.access_token" ], "okta.debug_context.debug_data.threat_suspected": [ "false" ], "okta.transaction.type": [ "WEB" ], "client.geo.location": [ { "coordinates": [ 1, 1 ], "type": "Point" } ], "event.module": [ "okta" ], "okta.actor.id": [ "redacted" ], "source.geo.country_iso_code": [ "IN" ], "okta.target": [ { "id": "redacted", "type": "access_token", "display_name": "Access Token", "alternate_id": null } ], "source.user.id": [ "redacted" ], "client.geo.city_name": [ "Bengaluru" ], "source.as.organization.name.text": [ "Reliance Jio Infocomm Limited" ], "data_stream.namespace": [ "default" ], "source.as.organization.name": [ "Reliance Jio Infocomm Limited" ], "source.geo.continent_name": [ "Asia" ], "client.user.full_name.text": [ "redacted" ], "okta.security_context.as.organization.name": [ "reliance jio infocomm limited" ], "source.geo.country_name": [ "India" ], "event.dataset": [ "okta.system" ] } } ```

In the example JSON, we would ideally either dissect the string to get grantType or do a regex search in ES|QL as follows below. However this is not achievable as flattened field types are not supported in ES|QL and nor is it on the roadmap from what I've found.

from logs-okta*
| where
    event.action == "app.oauth2.token.grant.access_token" and event.outcome == "success"
    and okta.client.user_agent.raw_user_agent != "Okta-Integrations"
    and okta.actor.type == "PublicClientApp"
    and okta.actor.display_name != "Okta Dashboard"
    and okta.debug_context.debug_data RLIKE ".*client_credentials.*"

Solution Options

On ingest, create a okta.debug_context.debug_data field that is keyword type. This is straight forward and allow us to use pre-processing commands like DISSECT or GROK to wrangle the data ourselves.
Anytime okta.debug_context.debug_data is observed, explode it to create new keyword fields for each key. This may add some complexity due to the indeterministic nature of the values in this field.

I'd also vote to do this with okta.target if possible as well as this has important details about the affected user or app in Okta.

For debug_data, we should add the following at least:

grantedScopes
clientSecret
requestedScopes
grantType

elasticmachine commented 2 months ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

efd6 commented 2 months ago

debugContext.debugData is an object, so rendering this as a string is likely to result in complex strings that will be largely meaningless to regular languages like dissect and grok. The current implementation of the integration allows the user to configure the keep or remove the flattened representation of the original debugContext.debugData data in okta.debug_context.debug_data.flattened. This allows users to add specific processors to retain fields that are needed. Note that Okta specifically state that the contents of debugContext.debugData are not stable and should not be relied on; this is why (partly) we don't map all the fields in this group.

Addressing the proposed options:

This would require re-marshalling the object that was unmarshalled by the original json processor. This is costly and would result in a text that is less easily interpreted than the existing flattened object.
We could to this by mapping as an object with object fields dynamically mapped as keywords, but given that we have no guarantee of the content of the group, we open ourselves up to mapping explosion.

Is there a reason that an @custom cannot be used to pre-condition the fields that the user needs? In the case that there are fields that are need for published rules, is there a reason that specific fields cannot be extracted in the packages published pipeline?

terrancedejesus commented 2 months ago

@efd6 - Thanks for the reply!

debugContext.debugData is an object, so rendering this as a string is likely to result in complex strings that will be largely meaningless to regular languages like dissect and grok.

AWS cloudtrail logs from the AWS integration are a great example where having a keyword type and flattened field both serve their purpose. For example, in the image below we have AWS cloudtrail logs, where aws.cloudtrail.request_parameters is a string and aws.cloudtrail.flattened.request_parameters is an object or in JSON. In this example, if I wanted to aggregate and count or filter on documentName with ES|QL and only aws.cloudtrail.flattened.request_parameters was available, I would not be able to as flattened fields are not supported.

However, because aws.cloudtrail.request_parameters exists as keyword, I can do the following with dissect.

from logs-aws.cloudtrail-*
| where @timestamp > now() - 7 day
| where event.provider == "ssm.amazonaws.com" and event.action == "SendCommand"
| dissect aws.cloudtrail.request_parameters "%{}documentName=%{document_name},%{}"
| dissect aws.cloudtrail.response_elements "%{}instanceIds=[%{instance_id}],%{}"
| where document_name in ("AWS-RunPowerShellScript","AWS-RunShellScript") and instance_id != "*"
| stats user_command_counts = count(*) by instance_id
| where user_command_counts == 1

Therefore, I would argue that having a string or keyword representation of debugContext.debugData is not meaningless as we would do something similar.

The current implementation of the integration allows the user to configure the keep or remove the flattened representation of the original debugContext.debugData data in okta.debug_context.debug_data.flattened. This allows users to add specific processors to retain fields that are needed.

This seems like a great option for those that wouldn't mind making some adjustments, however, our rules are an OOTB integration and we typically do not recommend users make such changes for rules. Typically, the extent we may suggest in a note within the rules would be any integration configuration options within the UI itself that are available when enabling or editing.

Note that Okta specifically state that the contents of debugContext.debugData are not stable and should not be relied on; this is why (partly) we don't map all the fields in this group.

This is the case for many CSP and SaaS solutions mainly because their operability is based on requests and responses between RESTful APIs. Thus the objects captured in fields like aws.cloudtrail.request_parameters or okta.debug_context.debug_data are bound to be different depending on the API call being made. Our approach in ES|QL for rules or hunting is to filter upfront on the API calls and specific key:value pairs before we use dissect and grok to wrangle the data.

I identified that we breakdown debug_data here in the fields.yml, along with other fields in that object. Is it possible to 1) do this with okta.target and 2) add the following as well:

grantedScopes
clientSecret
requestedScopes
grantType

I'm not familiar with the ingest pipeline configurations but it seems that we are capable of ignoring if it is missing as shown here.

Is there a reason that an @custom cannot be used to pre-condition the fields that the user needs?

I'm not familiar with @custom regarding pre-conditioned fields but we would like to steer away from customization as much as possible to ensure the OOTB rules are compatible with the integration OOTB.

In the case that there are fields that are need for published rules, is there a reason that specific fields cannot be extracted in the packages published pipeline?

If I understand correctly, this solution is similar to what I mentioned above with the ingest pipeline configuration? If so that is definitely a decent start it would seem. Long-term I believe it requires maintaining overtime because of the dynamic nature of debug_data. Rather, it seemed it would be more straightforward to keep the flattened field, but also add a string field so we can do processing on the fly in ES|QL with grok and dissect.

efd6 commented 1 month ago

@terrancedejesus

AWS cloudtrail logs from the AWS integration are a great example where having a keyword type and flattened field both serve their purpose. …

It looks to me that what is being done there is an ad hoc format; the data is being rendered as a Java output, not JSON. I would be very reluctant to do that.

from logs-aws.cloudtrail-*
| where @timestamp > now() - 7 day
| where event.provider == "ssm.amazonaws.com" and event.action == "SendCommand"
| dissect aws.cloudtrail.request_parameters "%{}documentName=%{document_name},%{}"
| dissect aws.cloudtrail.response_elements "%{}instanceIds=[%{instance_id}],%{}"
| where document_name in ("AWS-RunPowerShellScript","AWS-RunShellScript") and instance_id != "*"
| stats user_command_counts = count(*) by instance_id
| where user_command_counts == 1

I'm sorry.

It seems to me that the query language should include a way of dealing with flattened objects (and possibly JSON strings), which are guaranteed to be valid JSON by the time that it has been ingested. I would certainly be pushing for this on the basis that numerous users would also be wanting to do something like this as evidenced by the existence of JSON queries in both PostgreSQL and SQLite (and presumably many other stores).

If we do store the flattened as a string, I would want to do this only conditionally on the basis of user configuration, defaulting to off; formatting a JSON string is not a cheap operation, so I would not want to have this always done if it is only going to be used by a subset of users.

This is the case for many CSP and SaaS solutions mainly because their operability is based on requests and responses between RESTful APIs.

I do not believe that this is the case here. The fields described here should be interpreted as internal with no guarantee of stability, not varying on the basis of requests. This is described here:

[!IMPORTANT] The information contained in debugContext.debugData is intended to add context when troubleshooting customer platform issues. Both key names and values may change from release to release and aren't guaranteed to be stable. Therefore, they shouldn't be viewed as a data contract but as a debugging aid instead.

This in itself is, to me, a significant caveat when the data is load bearing in a security context.

add the following as well:

grantedScopes

clientSecret

requestedScopes

grantType

That seems reasonable.

If I understand correctly, this solution is similar to what I mentioned above with the ingest pipeline configuration? If so that is definitely a decent start it would seem.

That is correct.

Rather, it seemed it would be more straightforward to keep the flattened field, but also add a string field so we can do processing on the fly in ES|QL with grok and dissect.

Modulo the performance cost, I agree. I would like to understand the features that do exist for interacting with flattened fields. Are there any?

In the first instance, I'll send a change to add support for the four fields listed above. We will discuss the broader issue of a string representation of the data and get back.

efd6 commented 1 month ago

@terrancedejesus To short circuit the discussion, please also let me know how you would like to be using the okta.target objects. This is currently a flattened type field since it is an array of objects. The likely approaches are to convert this to a nested field (which would preserve associations, but be a breaking change), to convert to an object (which would not preserve association, and would still be a breaking change), or to extract all of the fields to arrays with the exception of the detailEntry and changeDetails since we'd just end up in the same situation where we have either a flattened or nested set of fields or objects.

terrancedejesus commented 1 month ago

@efd6 Thanks for the review and feedback.

It looks to me that what is being done there is an ad hoc format; the data is being rendered as a Java output, not JSON. I would be very reluctant to do that.

What is the reluctance here for not mirroring what a much larger integration is doing? Also, is there a best practice for handling this situation for an integration? I assumed that if it was used in such a popular and large integration such as AWS, it could be done with Okta?

It seems to me that the query language should include a way of dealing with flattened objects (and possibly JSON strings), which are guaranteed to be valid JSON by the time that it has been ingested.

An issue exists in ES specifically for this, where I have commented the importance as well. Where this lies on the roadmap is unknown at the moment which is why this issue was created the same day as my comment in that issue - to explore all potential solutions.

Ref: https://github.com/elastic/elasticsearch/issues/105637

If we do store the flattened as a string, I would want to do this only conditionally on the basis of user configuration, defaulting to off

This seems viable, but may be a consistent issue with other integrations and fields as well. We have setup guides in our detection rules, therefore could instruct the user whose enabling the rule to turn this on.

I do not believe that this is the case here. The fields described here should be interpreted as internal with no guarantee of stability, not varying on the basis of requests.

Thank you for pointing this out specifically. Unfortunately, much of the context needed to create medium-high fidelity rules in Okta relies on this field and will be used for rules. If we identify changes to data here from the Okta side, we will adjust the rules accordingly. While I understand this makes it challenging to create a static map of what to pull out from these objects, a keyword or string representation at least would allow us to check for pre-existing data via wildcards and then process those fields, then add our filters.

Modulo the performance cost, I agree. I would like to understand the features that do exist for interacting with flattened fields. Are there any?

At the moment, there is no support for flattened fields at all in ES|QL. Here is a list of all unsupported fields -> https://www.elastic.co/guide/en/elasticsearch/reference/8.12/esql-limitations.html#_unsupported_types

With KQL, flattened fields are supported for searching. Thus a query such as event.dataset: "okta.system" and okta.target.display_name: "unknown" where display_name is in the okta.target flattened field object but can be filtered on.

In the first instance, I'll send a change to add support for the four fields listed above. We will discuss the broader issue of a string representation of the data and get back.

Thank you for completing this!

@efd6

The likely approaches are to convert this to a nested field (which would preserve associations, but be a breaking change), to convert to an object (which would not preserve association, and would still be a breaking change), or to extract all of the fields to arrays with the exception of the detailEntry and changeDetails since we'd just end up in the same situation where we have either a flattened or nested set of fields or objects.

Thanks for laying out the options. I may be misunderstanding what these changes indicate, but what seems to be missing is mimicking what AWS integration has. My example is aws.cloudtrail.request_parameters in the AWS Cloudtrail integration.

okta.flattened.target -> flattened field type okta.target -> keyword okta.target.text -> text

This way, flattened could be accessed via KQL normally, keyword would be compatible with languages like ES|QL and EQL where functions would allow users and us to pre-process the objects inside as best seen fit. For text I am not aware of why this would be valuable, I just saw it was also included in the AWS integration.

efd6 commented 1 month ago

What is the reluctance here for not mirroring what a much larger integration is doing?

I've spelunked the origin of the approach and I don't see any design that led to the implementation there. The reluctance is that the format that is being used is not one that is intended to be roundtrippable; this means that there is a requirement to use dirty regexp approaches as exist above, and event when those are used we leave ourselves open to ambiguous interpretation of the strings that will lead to future support cases and poor user experience. I would rather come to a good design that minimises the risk of those happening.

Also, is there a best practice for handling this situation for an integration? I assumed that if it was used in such a popular and large integration such as AWS, it could be done with Okta?

Popularity is not necessarily a good indicator of quality unfortunately.

An issue exists in ES specifically for this, where I have commented the importance as well. Where this lies on the roadmap is unknown at the moment which is why this issue was created the same day as my comment in that issue - to explore all potential solutions.

I would prefer to come to a correct solution later than be burdened with an albatross that has the kinds of issues I raise above and that we can't remove because it has become depended on by users, even though flaky.

At the moment, there is no support for flattened fields at all in ES|QL. Here is a list of all unsupported fields -> https://www.elastic.co/guide/en/elasticsearch/reference/8.12/esql-limitations.html#_unsupported_types

That is unfortunate.

Thanks for laying out the options. I may be misunderstanding what these changes indicate, but what seems to be missing is mimicking what AWS integration has. My example is aws.cloudtrail.request_parameters in the AWS Cloudtrail integration.

That is partly because I do not believe that that integration has a good design for this, but also because the aws.cloudtrail.request_parameters is an object, not an array of objects as okta.target is. This has implications on the approaches that we can and should use. Note also that just changing the mapping of okta.target to keyword (and doing the required work to make that actually work) would break all users who have rules that are based on okta.target being a flattened.

I will raise the possibility of rendering the relevant fields as a JSON message that could be treated as text and (possibly in the future) be treated sensibly as an actual object (when that capacity is implemented).

terrancedejesus commented 3 weeks ago

The reluctance is that the format that is being used is not one that is intended to be roundtrippable

Roger that and thank you for the clarification.

That is partly because I do not believe that that integration has a good design for this, but also because the aws.cloudtrail.request_parameters is an object, not an array of objects as okta.target is. This has implications on the approaches that we can and should use.

Completely understood on the potential complications with this. Thank you for clarification.

I will raise the possibility of rendering the relevant fields as a JSON message that could be treated as text and (possibly in the future) be treated sensibly as an actual object (when that capacity is implemented).

Does this allow us to search on it with ES|QL and EQL or not?

@efd6 I've shared several options from my limited perspective of how to handle this, however they do not seem ideal from your perspective. As it stands, we are blocked from writing specific detection query logic for emerging threats as a result as we wait for either ES|QL dev team to add support for flattened fields or identify a plausible solution with the integration data itself. Therefore, what is your recommended steps or solution moving forward?

efd6 commented 3 weeks ago

@terrancedejesus If I render the okta.target field into JSON as okta.target.json would that work for you? You should be able to use dissect and similar tools on that text to be able to detect the patterns that you are looking for.

terrancedejesus commented 3 weeks ago

@efd6 - I am not familiar with what .json would be in terms of field type. Looking at the documentation would the field type be one of these then?

object A JSON object. flattened An entire JSON object as a single field value. nested A JSON object that preserves the relationship between its subfields.

As long as it is supported by ES|QL, then it should be fine to use. Might we be able to create an example document in a test stack and use grok or dissect on it from ES|QL to test?

efd6 commented 3 weeks ago

It would be a string.

terrancedejesus commented 3 weeks ago

It would be a string.

If I understand correctly, that should be fine. Can we test this before applying the change just to confirm?

if this works for okta.target can we do the same for okta.debug_context.debug_data as well and ignore if that fields does not exist in the document to account for it not always being available?

elastic / integrations