elastic / detection-rules

https://www.elastic.co/guide/en/security/current/detection-engine-overview.html
Other
1.92k stars 492 forks source link

[FR] [DAC] Add Support for Known Types to Auto-generated Schemas #3985

Closed eric-forte-elastic closed 1 month ago

eric-forte-elastic commented 1 month ago

Pull Request

Issue link(s):

Summary - What I changed

This PR adds adds additional logic to the auto-generation of custom schema. This new logic checks to see if the desired new field exists in any other known ecs, non-ecs, or integration index. If it does exist, it uses the same type for the new field where previously all fields defaulted to the type keyword.

Particular attention should be given to the following function.

@cached
def get_all_flattened_schema() -> dict:
    """Load all schemas into a flattened dictionary."""
    all_flattened_schema = {}
    for _, schema in get_non_ecs_schema().items():
        all_flattened_schema.update(flatten(schema))

    ecs_schemas = get_schemas()
    for version in ecs_schemas:
        for index, info in ecs_schemas[version]["ecs_flat"].items():
            all_flattened_schema.update({index: info["type"]})

    for _, integration_schema in load_integrations_schemas().items():
        for _, index_schema in integration_schema.items():
            all_flattened_schema.update(flatten(index_schema))

    return all_flattened_schema

In the event of an overlap in field -> type mappings in the schema, a decision needs to be made as to which set of schema is the most and least authoritative. In the above example the as follows in order of least to most authoritative: 1) Load non_ecs schema, 2) load ecs schema 3) load integrations. Since the calls are made with .update if there is overlap it will be overwritten by the update call. So in this case the most authoritative is integration schema, followed by ecs, and then the least being non-ecs as determined by the order of the load where the first loaded is the least authoritative.

This can be easily changed by switching the order. In review, it would be great if we could have feedback or other ideas as to the order of authority in case of overlap.

How To Test

To test this, first add the appropriate config variable. Next, run view-rule on a rule that has fields that are not currently present in a schema. The rule should validate successfully and your schema file should be updated. Also make sure you use a rule (like the example below) that has a known field in a different index to make sure the type is carried from it in the auto gen schema file.

Example: python -m detection_rules view-rule custom_rules/rules/dac_demo_dev_rule_1.toml

Example Rule

```toml [metadata] creation_date = "2024/07/29" maturity = "production" updated_date = "2024/07/29" [rule] actions = [] author = ["DAC User"] description = "Test Rule" enabled = true exceptions_list = [] false_positives = [] from = "now-6m" index = ["the-best-integration-ever*"] interval = "5m" language = "eql" max_signals = 100 name = "DAC Demo Dev Rule 2" references = [] risk_score = 47 risk_score_mapping = [] rule_id = "af9c4114-a7d5-43de-869b-105735c278e8" setup = "Test Setup" severity = "medium" severity_mapping = [] tags = [] threat = [] to = "now" type = "eql" query = ''' process where host.os.type.fakeData == "linux" and process.name.okta.thread == "updated" and dll.Ext.relative_file_creation_time > 2 ''' [[rule.required_fields]] ecs = true name = "host.os.type" type = "keyword" [[rule.required_fields]] ecs = true name = "process.name" type = "keyword" ```

Expected Auto Gen Schema file after running view rule

```json { "the-best-integration-ever*": { "dll.Ext.relative_file_creation_time": "double", "host.os.type.fakeData": "keyword", "process.name.okta.thread": "keyword" } } ```

Checklist

Contributor checklist

protectionsmachine commented 1 month ago

Enhancement - Guidelines

These guidelines serve as a reminder set of considerations when addressing adding a feature to the code.

Documentation and Context

Code Standards and Practices

Testing

Additional Checks

Mikaayenson commented 1 month ago

++ also dont forget to support ml integrations.

eric-forte-elastic commented 1 month ago

++ also dont forget to support ml integrations.

++ Added support in most recent commit.