Source/target and diff modeling

andrewstucki commented 4 years ago

This is a more generic application of modeling either "source/target" style events or "diff" events that I'm spinning off from https://github.com/elastic/ecs/issues/589

As I initially mentioned (https://github.com/elastic/ecs/issues/589#issuecomment-555203355). There are a number of things that ECS should support modeling including things like:

setuid/setgid operations
file modification events (i.e. renames, permissions)
IPC calls
network requests/flows (current use of source and destination)
user modifications
process execs
registry modifications
windows source/target audit log info

Overall these more or less fall into two categories:

Modelling communication between two like things (network connections, IPCs, windows audit log)
Modification events

Currently the way ECS has started to approach this is to make fields that are specific to each of these domains, i.e. source/destination are currently for network modeling only, and then there's also client/server

I'm advocating for adopting a more generic field set that allows you to do generic source/target or diff modeling which would essentially allow you to embed any other field set in it.

For example--something like origin and target (sad that source/destination is already taken):

For file modifications:

origin.file.name = "foo"
target.file.name = "bar"

For process execs:

origin.process.path = "foo"
target.process.path = "bar"

For network requests (slightly difficult because of the lack of port info):

origin.host.ip = "foo"
target.host.ip = "bar"

For user modification (maybe by another user baz who did the modification?):

user.name = "baz"
origin.user.name = "foo"
target.user.name = "bar"

Thoughts?

janniten commented 4 years ago

For user modification (maybe by another user baz who did the modification?):
user.name = "baz"
origin.user.name = "foo"
target.user.name = "bar"

Hi @andrewstucki, baz is the user performing the modification? Thank you

andrewstucki commented 4 years ago

@janniten that was the idea in that particular example, but the desire was just to highlight the need to model any sort of diffing or source/destination style events in a sizeable portion of the existing field sets out there.

I'm fairly open to ideas on naming/implementation (would this be at the top-level as I suggested or embedded inside each top-level field set, etc.), but just calling out the need to solve this issue for a number of use-cases so that we don't just solve the same problem a myriad of different ways as they arise. So, any suggestions/opinions on how we should go about this are more than welcome!

rw-access commented 4 years ago

Cross-process activity could also be modeled this way. This could include injection, credential access/handle opening, etc.

source.process.name = "totallynotmalware.exe"
target.process.name = "lsass.exe"

I'd prefer that versus nesting source/target under process. Then you have two mega namespaces source and target without needing to pollute everything with a source and target.

You could also argue that source is implicit for the existing namespaces. The above example could be equivalently written as:

process.name = "totallynotmalware.exe"
target.process.name = "lsass.exe"

The big advantage is that you can now build aggregations on those better. For instance, "what are the counts of the categories of events for cmd.exe?" You don't want to search search.process.name and process.name, and that will also make it tricky to assign them into buckets.

process.name == "cmd.exe" // instead of : process.name == "cmd.exe" or source.process.name == "cmd.exe"
| count event.category

andrewstucki commented 4 years ago

I totally agree about the implicit nature of some things, and am slightly less concerned with the proposed origin field set than the target field set. The thing that this would preclude us from doing though is using something like the origin field set to model the original state of a diff operation from a third entity (i.e. the example of the user who modifies a different user's name).

At least in our particular use case for security related stuff, I think that there's a ton of utility introducing the target field set even if we weren't able to introduce origin. But I'd like people with other use-cases to chime in on the utility of these constructs as well.

rw-access commented 4 years ago

@andrewstucki just to make sure I'm tracking your example well, does this reflect your example? User Alice renamed Bob to Robert.

{
  "user": {"name": "alice"},
  "origin": {"user": {"name": "bob"}},
  "target": {"user": {"name": "robert"}}
}

This maps most closely to a Event ID of 4781 for Windows Event Logs (Security)

Subject:

   Security ID:  ACME\Alice
   Account Name:  Alice
   Account Domain:  ACME
   Logon ID:  0x1f40f

Target Account:

   Security ID:  ACME\robert
   Account Domain:  ACME
   Old Account Name: bob
   New Account Name: robert

Additional Information:

   Privileges:  -

marshallmain commented 4 years ago

Using origin and target to represent the old vs new state of a single user (or object in general) seems like a big semantic change from the process semantics where the origin process is generally taking action on the target process.

If I started out using origin.process and target.process and tried to apply that logic to origin.user and target.user I would expect origin.user to be taking an action on target.user, whereas the suggestion here is origin.user is transforming into target.user.

I definitely want to add the capability to describe an original state that has transformed to a new state but conflating it with an object taking action on a different object seems confusing to me.

rw-access commented 3 years ago

I'm thinking that this issue is overdue to make it into ECS. We're also running into this with Endpoint, as part of the development for process injection telemetry and detection. The interim approach is Target.process for now. Capitalizing the first letter is generally what Elastic Endpoint does to avoid collisions with future ECS, like process.Ext for example.

I'm most interested in source/target for process events. think the most common use cases involve an acting process that "does the thing" and the target process that "has the thing done to it." Do you think that's fair to initially limit scope to that @andrewstucki? We can leave this issue open for the generic source/target problem if you like.

@ebeahan Do you think this would be good to make into an RFC, or straight to a PR? I'm thinking that process.* should capture the source information and process.target.* is just the process fieldset reused to capture information about the target process. It's possible process.target.parent would be useful or populated (@gabriellandau do you know?), but I don't think we would nest any further than that.

@andrewstucki @gabriellandau either of you interesting in helping drive this forward?

ebeahan commented 3 years ago

@rw-access Yes, I do think this topic makes a good RFC candidate. Since this is something that's already been discussed extensively, the initial RFC draft should probably target stage 1.

++ for using process for sources and process.target.* for the destinations/targets. The approach has symmetry with what's already been adopted for multiple users in events.

andrewstucki commented 3 years ago

@rw-access I think it's definitely good to limit this at first, and 👍 on the idea of nesting under the process field, as @ebeahan pointed out, there's new precedent on preferring field subsets like target, etc. nested under the entity v. the other way around.

gabriellandau commented 3 years ago

Interesting. In the past, we would put something under something else to indicate that it is a property of that other thing. Would Target.process.thread go into process.thread.target or process.target.thread?

Some other questions that come to mind while we're discussing schema layout.

A Windows token is a security credential containing your username, groups, and related information. This is where we get the username from. Threads can have [impersonation] tokens that differ from their containing process's [primary] token. An event/action occurs in the security context of the impersonation token, but it can still be useful to know the primary token. These tokens can have different users and IDs. Target threads can be impersonating as well, so we can have:

acting process token
OPTIONAL acting thread token. If present, this contains the effective user.
target process token
OPTIONAL target thread token

We're not currently returning impersonation token information, but we may want to in the future.

Each of these tokens can have additional relevant attributes, some of which we like to know, such as Integrity Level. How do you think this should all lay out? Here's some pared-down sample data from a 7.12.0 diagnostic (not user-facing yet) alert. Note that this alert describes an action between two threads in the same process, so everything is the same except process.thread.id != Target.process.thread.id, but that's often not the case.

"Target": {
    "process": {
        "Ext": {
            "token": {
                "domain": "DESKTOP-4S6F4KN",
                "elevation": true,
                "elevation_type": "full",
                "integrity_level_name": "high",
                "sid": "S-1-5-21-2862132742-1403383571-1346394525-1001",
                "user": "user"
            }
        },
        "executable": "C:\\Windows\\System32\\cmd.exe",
        "parent": {
            "Ext": {
                "token": {
                    "domain": "DESKTOP-4S6F4KN",
                    "elevation": true,
                    "elevation_type": "full",
                    "integrity_level_name": "high",
                    "sid": "S-1-5-21-2862132742-1403383571-1346394525-1001",
                    "user": "user"
                },
            },
            "executable": "C:\\Program Files\\Python38\\python.exe",
            "pid": 4880
        },
        "thread": {
            "id": 7712
        }
    }
},
"process": {
    "Ext": {
        "token": {
            "domain": "DESKTOP-4S6F4KN",
            "elevation": true,
            "elevation_type": "full",
            "integrity_level_name": "high",
            "sid": "S-1-5-21-2862132742-1403383571-1346394525-1001",
            "user": "user"
        }
    },
    "executable": "C:\\Windows\\System32\\cmd.exe",
    "parent": {
        "Ext": {
            "token": {
                "domain": "DESKTOP-4S6F4KN",
                "elevation": true,
                "elevation_type": "full",
                "integrity_level_name": "high",
                "sid": "S-1-5-21-2862132742-1403383571-1346394525-1001",
                "user": "user"
            },
        },
        "executable": "C:\\Program Files\\Python38\\python.exe",
        "pid": 4880
    },
    "thread": {
        "id": 7680
    }
}

rw-access commented 3 years ago

There is another issue for tokens specifically, #810.

Do you think we would scope those separately or are they terribly intertwined and impossible to decouple? (Honest question)

The meaning of .target, .effective, .new originated in RFC-0007. AFAICT.

Good question about target threads. I don't know the right answer, but that does sound like exactly the right thing to hash out on the RFC for target processes: https://github.com/elastic/ecs/pull/1297

Wanna be a SME for that?

Edit: I think it would be process.target.thread.*

ebeahan commented 3 years ago

Good question about target threads. I don't know the right answer, but that does sound like exactly the right thing to hash out on the RFC for target processes: #1297

Yes absolutely feel free to continue the discussion over in #1297. 😄

++ to process.target.thread.*

elastic / ecs

Source/target and diff modeling #678