elastic / detection-rules

https://www.elastic.co/guide/en/security/current/detection-engine-overview.html
Other
1.88k stars 482 forks source link

[FR] Add `source_updated_at` to Rule Schema as a Build Time Field #2826

Open terrancedejesus opened 1 year ago

terrancedejesus commented 1 year ago

Is your feature request related to a problem? Please describe. No.

Describe the solution you'd like Add creation_date and updated_date to rule objects when a release package is created.

Additional context When we build a rules release package, all rule objects should have a creation_date and updated_date field in them. This will be used by Kibana for the updates review workflow.

@jpdjere @approksiu

Dev branch: https://github.com/elastic/detection-rules/tree/fr-add-dates-to-rule-data

terrancedejesus commented 1 year ago

Update

We have some considerations with adding this. These fields are currently in the rule meta since they do not matter for the SHA256 hash calculation. As a result, typically anything we add to the hash calculation should be moved into the rule data itself and have validation done on the values.

Options:

  1. Add creation_date and updated_date to the API formatted rule object, after it is built and hash has been calculated.
  2. Add creation_date and updated_date to strip_additional_fields and remove them during hash calculation.
  3. Move creation_date and updated_date into rule.contents.data and remove from rule.contents.meta.

Either way we need to add validation to these field value pairs to keep the date values consistent.

@Mikaayenson @eric-forte-elastic @brokensound77 - Any additional thoughts?

eric-forte-elastic commented 1 year ago

I do not see an issue with this approach/solution :+1:

Just as a note, we will need to update unit tests to also have creation_data and updated_date in the rule.contents.data and update the following line from packaging.py

    def _package_kibana_index_file(self, save_dir):
        """Convert and save index file with package."""
        sorted_rules = sorted(self.rules, key=lambda k: (k.contents.metadata.creation_date, os.path.basename(k.path)))

None of these should be an issue as the functions/tests have access to the contents object of a rule allowing them access to both metadata and data.

Mikaayenson commented 1 year ago

@terrancedejesus Can you explain why we want to make the change to move these fields at all (or even add to the build)? Was it requested upstream?

terrancedejesus commented 1 year ago

@terrancedejesus Can you explain why we want to make the change to move these fields at all (or even add to the build)? Was it requested upstream?

Requested upstream from @jpdjere for UI regarding rule update workflow.

botelastic[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

terrancedejesus commented 11 months ago

@jpdjere is this still a request from your team? If so, I'd like to get it correctly scoped for one of our upcoming sprints. Thank you!

jpdjere commented 11 months ago

Hi @terrancedejesus , yes this is still a valid request. We won't be working on anything that needs this data for 8.11 but probably 8.12

botelastic[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jpdjere commented 9 months ago

Hi @terrancedejesus . Do you have bandwidth for this in any upcoming releases?

terrancedejesus commented 9 months ago

@jpdjere - Thanks for the follow up! I added this to our teams next sprint cycle which starts Nov 27. With recent adjustments to our current sprint cycle, I will attempt to get started with this to determine if it is relatively straight forward and if so will have it in earlier.

brokensound77 commented 9 months ago

Hello @jpdjere ๐Ÿ‘‹ ,

Can you provide more context for this request? We are just trying to understand the reasoning and whether this is the best representation of this information.

Right now, since we do not push this with the rules, the dates are pulled from kibana:

This is reflective of the rules as they apply to a users stack, which seems accurate and informative.

Our dev cycle creates situations where rules may not be released for a few days or weeks after modification, so there is inconsistency that may cause confusion. More so, I think it may be more valuable understanding when a rule was created and modified within a stack vs when it was developed.

These fields under our metadata are currently used as a means to inform us on changes from a maintenance perspective.

Thoughts?

jpdjere commented 9 months ago

Hi @brokensound77 . Thanks for the follow up.

The idea behind this request is to give the users an idea of how "recent" the updates to a specific rule are, in order to know how long those specific updates for the rule have been pending. Here's a screenshot of the UI as proposed by our designers: image By sorting by the "Last updated" column, the user could have a quick understanding of which rules have been recently updated in the Fleet package and know which rule updates have been pending for a long time, i.e. should take their most immediate attention.

This becomes especially important when the user has neglected addressing updates from one (or many) package releases, and after some considerable amount of time sees in our Rule Update table a list of rule updates corresponding to more than one releases.

For example, a user seeing this table in October could see listed 4 rules that have updates coming from a package release made in March (so their updated_by timestamps would maybe be around January, February, March), and 5 more rules that have updates coming from a package released in August (and updated_by timestamps would maybe be around May, June, July). If a rule had updates in both releases, the latest would be displayed, since we always compare to the latest version.

(Sorry if this timelines don't make sense, I don't have in my mind right now what is your release cadence).

Having said that:

Our dev cycle creates situations where rules may not be released for a few days or weeks after modification, so there is inconsistency that may cause confusion.

I think that's a valid concern that can cause confusion to users, given the false impression that updates have been pending for a long time, when the rule updates have just been released. A couple questions:

(This second question, I think, is not that important considering that the user might accumulate updates from many subsequent releases, but it's good to know).

terrancedejesus commented 8 months ago

@jpdjere - Apologies for the questions going unanswered.

How usual is this discrepancy between the actual update date of a rule on your side and the release of a package?

We release OOB updates bi-weekly. Therefore, updated dated discrepancy could be 1-14 days apart as it would depend on the pull request merge and when the package reaches EPR. This could be expanded more if the release takes longer than expected, but is a rare occurrence.

Do you have an approximate idea of how large can the date range be for rule updates within one package release? What I mean is: if a package release includes 10 updates, what is the earliest and the latest updated_at timestamp that we would see?

Any time a rule is updated, the updated date value is updated. Therefore it could be any date between when the last package was available in EPR to the date when the next package is available in EPR. Again, we release bi-weekly so there is an approximate range of ~14 different dates that could apply.

terrancedejesus commented 7 months ago

Update 01/08/2023

We are moving forward with this as it is a requirement upstream for customizing prebuilt detection rules, milestone 3. Below are considerations:

@jpdjere or @banderror - Can you provide insight to the following for us. Thank you in advance!

terrancedejesus commented 7 months ago

Option 1 - In this option, we rely on post_dict_conversion to add a new method _convert_add_date_fields(). This method takes the rule metadata and assigns it to the same keys, except inside of the obj dictionary which is already converted to Kibana API format from to_api_format(). We also add a validate_date_format() method with validates_schema marshmallow decorator. This will ensure that the date formats are ISO 8601, if we choose to follow this standard. Notes below:

Commit Reference: https://github.com/elastic/detection-rules/commit/3bc8df6e8db0da0eab483ab26633e391fab18219

terrancedejesus commented 7 months ago

Option 2 - We only add meta to the package release files. Prebuilt rule packaging and artifact building rely mainly on to_api_format() method in rule.py. We already have an option include_metadata that is by default False. If True, it will add the RuleMeta as a dictionary to the rule object that will become the rule asset in the prebuilt rules package. Therefore we can avoid altering any data schema's, backport concerns, etc. Instead upstream on the Kibana side, they would handle accessing whatever metadata is shipped with the rule. This also allows us to avoid version bumps as well since we are not altering the rule contents, yet adding the metadata to the dropped files instead.

Commit Reference: https://github.com/elastic/detection-rules/commit/0402dc2ea99f187b3a842a08d222d7a30f4a164a

terrancedejesus commented 7 months ago

Option 3 - We move the date fields from RuleMeta and include them in BaseRuleData as requirements. We then need to adjust every rule back to 8.3 branch, automatically through backporting and manually. The outcome is that date fields will now be in the rule contents of the API formatted rule and be available explicitly upstream by Kibana. Dates would then affect the rule version as well as any changes to rule contents marks the rule as dirty. This option has the biggest amount of changes and correction across branches that would need to be addressed.

Mikaayenson commented 7 months ago

Option 4 - Similar to option 2, only instead of using metadata, is there any reason why we can't use the date of the release (almost like a build time field)? From the description, it doesn't seem necessary to have the exact date the rule was modified by our rule authors.

terrancedejesus commented 7 months ago

Option 4 - Similar to option 2, only instead of using metadata, is there any reason why we can't use the date of the release (almost like a build time field)? From the description, it doesn't seem necessary to have the exact date the rule was modified by our rule authors.

@Mikaayenson Great alternative with a few caveats. Our source-of-truth is typically the repository since that is where we lock versions. Let's say we lock versions and a rule has changes that cause the SHA256 to change. This "state" of the rule is only noticed during the lock versions, which we also release our commits from. Technically, up until this version lock, our rule could go through several changes and updates, but it is only when we lock versions do we track the current state of the rule. The last updated_date would be inline with this SHA256 change as the exact date when the version change was noticed.

We also have to take into consideration release timing. Releases could take 1-2 days, thus the potential for a divergence of dynamic dates based on building the package could occur not only from the version lock, but also between each package. All packages would have to be released GA on the same day for them to accurately reflect the same updated date.

eric-forte-elastic commented 7 months ago

Option 4 - Similar to option 2, only instead of using metadata, is there any reason why we can't use the date of the release (almost like a build time field)? From the description, it doesn't seem necessary to have the exact date the rule was modified by our rule authors.

I concur. If approach 4 is not a heavy lift, I would prefer this as well. While I do not think this is an immediate concern, I think there may be a case where we would not want to include all of the metadata in the release.

Option 4 - Similar to option 2, only instead of using metadata, is there any reason why we can't use the date of the release (almost like a build time field)? From the description, it doesn't seem necessary to have the exact date the rule was modified by our rule authors.

@Mikaayenson Great alternative with a few caveats. Our source-of-truth is typically the repository since that is where we lock versions. Let's say we lock versions and a rule has changes that cause the SHA256 to change. This "state" of the rule is only noticed during the lock versions, which we also release our commits from. Technically, up until this version lock, our rule could go through several changes and updates, but it is only when we lock versions do we track the current state of the rule. The last updated_date would be inline with this SHA256 change as the exact date when the version change was noticed.

We also have to take into consideration release timing. Releases could take 1-2 days, thus the potential for a divergence of dynamic dates based on building the package could occur not only from the version lock, but also between each package. All packages would have to be released GA on the same day for them to accurately reflect the same updated date.

Not sure if it would be adding too much overhead, but could we add a release tag when we lock versions and then pull the tagged SHAs of the rules and compare that way?

That being said, I also like option 2 and I do not see any immediate issues with it :+1:

Mikaayenson commented 7 months ago

so have to take into consideration release timing. R

IINM the date is to provide people with a general timeline:

By sorting by the "Last updated" column, the user could have a quick understanding of which rules have been recently updated in the Fleet package and know which rule updates have been pending for a long time, i.e. should take their most immediate attention.

So im not sure if we need it to align with rule updates or locked versions etc. It sounded like they just need to know when the package was last updated.

@eric-forte-elastic brought up another good idea in a slack thread about using release tags for the date information which is another interesting idea. Recording the idea here for posterity.

banderror commented 7 months ago

@jpdjere or @banderror - Can you provide insight to the following for us. Thank you in advance!

@terrancedejesus, thanks for checking this with us. @jpdjere can keep me honest, but:

  • What is the target minor release for this and will it backport to previous versions?

No concrete target minor release is defined at the moment. The Milestone 3 for customizing prebuilt rules is currently at the stage of technical design, the development hasn't started yet. I think you guys can safely expect 2 release cycles from now until we can release anything. It could be more.

When rule customization is ready for release in Kibana, we will not backport this to prior minor versions. We don't backport new features in general.

The earliest version to which we could backport these new fields in the package would be 8.12, because only in 8.12.0 we made prebuilt rule schema forward compatible with new package updates on our side (https://github.com/elastic/security-team/issues/6888).

  • Is there any specific date format that is required?

The standard one: ISO 8601 date-time string in UTC. Example:

2023-01-29T14:48:00.000Z
  • Is this only for created_date and updated_date or are the keys named differently.

Not sure I understand the question.

Naming of the keys hasn't been defined yet on our side and they are not in the schema yet. @jpdjere Could you please create a ticket for us, describe the requirements, and prioritize work on it?

terrancedejesus commented 7 months ago

@banderror thank you for taking the time to provide details.

The standard one: ISO 8601 date-time string in UTC. Example:

Our dates are typically YYYY\MM\DD. We can update all rules to use YYYY-MM-DD as it shouldn't affect their version or SHA256 since it is metadata.

Not sure I understand the question.

Right now we have creation_date and updated_date as the key names in each rule metadata. Do these need to be adjusted to match your key names upstream? We would prefer they stay the same downstream in our repository to reduce mass changes across different branches when we backport.

As you may have seen by TRaDe's discussion, the simplest approach for this would be to include the metadata of the rule as a root key in the JSON rule object. Thus any metadata from the rules can be accessed and used as your team pleases. Are there any objections to this or other preferences?

creation_date - Located in metadata of rule. Describes when the rule was first merged into main of our repository. updated_date - Located in metadata of rule. Describes the date of the latest update based on changes merged into main of our repository.

availability_date - This is not captured anywhere but would track when the rule update was made available (EPR GA) via typical Elastic rule update workflow. This could be deterministic on Fleet pulls from EPR. Not sure if we want to pursue this but it has come up in previous discussions and amongst TRaDE as we determine what we are attempting to convey to customers with the rule dates.

This is an example of the would be rule object:

```json { "author": [ "Elastic" ], "building_block_type": "default", "description": "Identifies the execution of a command via Microsoft Visual Studio Pre or Post build events. Adversaries may backdoor a trusted visual studio project to execute a malicious command during the project build process.", "from": "now-119m", "index": [ "logs-endpoint.events.*" ], "interval": "60m", "language": "eql", "license": "Elastic License v2", "meta": { "creation_date": "2023/09/26", "integration": [ "endpoint" ], "maturity": "production", "min_stack_comments": "New fields added: required_fields, related_integrations, setup", "min_stack_version": "8.3.0", "updated_date": "2023/09/26" }, "name": "Execution via MS VisualStudio Pre/Post Build Events", "query": "sequence with maxspan=1m\n [process where host.os.type == \"windows\" and event.action == \"start\" and\n process.name : \"cmd.exe\" and process.parent.name : \"MSBuild.exe\" and\n process.args : \"?:\\\\Users\\\\*\\\\AppData\\\\Local\\\\Temp\\\\tmp*.exec.cmd\"] by process.entity_id\n [process where host.os.type == \"windows\" and event.action == \"start\" and\n process.name : (\n \"cmd.exe\", \"powershell.exe\",\n \"MSHTA.EXE\", \"CertUtil.exe\",\n \"CertReq.exe\", \"rundll32.exe\",\n \"regsvr32.exe\", \"MSbuild.exe\",\n \"cscript.exe\", \"wscript.exe\",\n \"installutil.exe\"\n ) and\n not \n (\n process.name : (\"cmd.exe\", \"powershell.exe\") and\n process.args : (\n \"*\\\\vcpkg\\\\scripts\\\\buildsystems\\\\msbuild\\\\applocal.ps1\",\n \"HKLM\\\\SOFTWARE\\\\Microsoft\\\\VisualStudio\\\\SxS\\\\VS?\",\n \"process.versions.node*\",\n \"?:\\\\Program Files\\\\nodejs\\\\node.exe\",\n \"HKEY_LOCAL_MACHINE\\\\SOFTWARE\\\\Microsoft\\\\MSBuild\\\\ToolsVersions\\\\*\",\n \"*Get-ChildItem*Tipasplus.css*\",\n \"Build\\\\GenerateResourceScripts.ps1\",\n \"Shared\\\\Common\\\\..\\\\..\\\\BuildTools\\\\ConfigBuilder.ps1\\\"\",\n \"?:\\\\Projets\\\\*\\\\PostBuild\\\\MediaCache.ps1\"\n )\n ) and\n not process.executable : \"?:\\\\Program Files*\\\\Microsoft Visual Studio\\\\*\\\\MSBuild.exe\" and\n not (process.name : \"cmd.exe\" and\n process.command_line :\n (\"*vswhere.exe -property catalog_productSemanticVersion*\",\n \"*git log --pretty=format*\", \"*\\\\.nuget\\\\packages\\\\vswhere\\\\*\",\n \"*Common\\\\..\\\\..\\\\BuildTools\\\\*\"))\n ] by process.parent.entity_id\n", "references": [ "https://docs.microsoft.com/en-us/visualstudio/ide/reference/pre-build-event-post-build-event-command-line-dialog-box?view=vs-2022", "https://www.pwc.com/gx/en/issues/cybersecurity/cyber-threat-intelligence/threat-actor-of-in-tur-est.html", "https://blog.google/threat-analysis-group/new-campaign-targeting-security-researchers/", "https://github.com/sbousseaden/EVTX-ATTACK-SAMPLES/blob/master/Execution/execution_evasion_visual_studio_prebuild_event.evtx" ], "related_integrations": [ { "package": "endpoint", "version": "^8.2.0" } ], "required_fields": [ { "ecs": true, "name": "event.action", "type": "keyword" }, { "ecs": true, "name": "host.os.type", "type": "keyword" }, { "ecs": true, "name": "process.args", "type": "keyword" }, { "ecs": true, "name": "process.command_line", "type": "wildcard" }, { "ecs": true, "name": "process.entity_id", "type": "keyword" }, { "ecs": true, "name": "process.executable", "type": "keyword" }, { "ecs": true, "name": "process.name", "type": "keyword" }, { "ecs": true, "name": "process.parent.entity_id", "type": "keyword" }, { "ecs": true, "name": "process.parent.name", "type": "keyword" } ], "risk_score": 21, "rule_id": "fec7ccb7-6ed9-4f98-93ab-d6b366b063a0", "severity": "low", "tags": [ "Domain: Endpoint", "OS: Windows", "Use Case: Threat Detection", "Tactic: Defense Evasion", "Tactic: Execution", "Rule Type: BBR", "Data Source: Elastic Defend" ], "threat": [ { "framework": "MITRE ATT&CK", "tactic": { "id": "TA0005", "name": "Defense Evasion", "reference": "https://attack.mitre.org/tactics/TA0005/" }, "technique": [ { "id": "T1127", "name": "Trusted Developer Utilities Proxy Execution", "reference": "https://attack.mitre.org/techniques/T1127/", "subtechnique": [ { "id": "T1127.001", "name": "MSBuild", "reference": "https://attack.mitre.org/techniques/T1127/001/" } ] } ] }, { "framework": "MITRE ATT&CK", "tactic": { "id": "TA0002", "name": "Execution", "reference": "https://attack.mitre.org/tactics/TA0002/" }, "technique": [] } ], "type": "eql", "version": 1 } ```
jpdjere commented 7 months ago

@terrancedejesus Thanks for following up on this.

As you may have seen by TRaDe's discussion, the simplest approach for this would be to include the metadata of the rule as a root key in the JSON rule object. Thus any metadata from the rules can be accessed and used as your team pleases. Are there any objections to this or other preferences?

We assessed you proposal of adding the meta property to the rule object, and -since we only need the updated_date right now, we would strongly prefer not to "pollute" the object with all the other data included in the meta property. We currently have no use for any other metadata - and probably won't have in the future. Is adding this meta property something that you need for internal tooling/processes on your side? Otherwise, we would prefer to simply add an update_date field as a top level field.

Right now we have creation_date and updated_date as the key names in each rule metadata. Do these need to be adjusted to match your key names upstream?

updated_date is OK for our use. We also have similar updated_at keys in our internal rule object schemas but that has a different semantic meaning and want to avoid collisions in key naming.

creation_date - Located in metadata of rule. Describes when the rule was first merged into main of our repository. updated_date - Located in metadata of rule. Describes the date of the latest update based on changes merged into main of our repository. availability_date - This is not captured anywhere [...]

Based on your explanation of the biweekly releases, we decided that we don't need a precise date for the update, but a "ballpark" date. If the difference between updated_date and availability_date is maximum 14 days, we have no issues in using whatever value is easier to calculate or pass down to the rule object.

Our dates are typically YYYY\MM\DD. We can update all rules to use YYYY-MM-DD as it shouldn't affect their version or SHA256 since it is metadata.

We would strongly prefer if you could format the date of the top-level update_date field in ISO 8601 date-time string in UTC, such as: 2023-01-29T14:48:00.000Z. Since Kibana parses dates based on the server's locale, we cannot guarantee that 2021\01\02 will always be parsed as January 2nd, 2021 and not February 1st, 2021 in some other locale. Is this possible on your side?

Also, I'll create a ticket on the Kibana repo for adding this task and link it here.

Mikaayenson commented 7 months ago

During the simplified protections sync, Juan mentioned they only need a general availability date, ideally called elastic_last_updated (@jpdjere please let us know if I recalled this name incorrectly). This would be similar to a build time field and placed at the root of the object. The two week window if that is how often we release is general enough for them.

jpdjere commented 7 months ago

Thanks for the follow-up here @Mikaayenson

Yes, I was originally going for update_date but I think with elastic_last_updated we can have a more clear meaning that this field only refers to updates done by the Elastic team and not by the user, and is thus only valid for Prebuilt Elastic rules. Also, it avoids collisions and confusion with other similar fields that we have in out internal rule objects.

So ๐Ÿ‘ from my side for naming the property like that, and being it at the root level of the rule object, and with a ISO 8601 format date.

terrancedejesus commented 7 months ago

@Mikaayenson @jpdjere - Thanks for the insight and update.

Still some lingering questions that are unclear to me:

  1. Are we no longer wanting a date that represents when the rule was created?
  2. If we are targeting an "availability" date, is this logic not possible by Kibana when the rule update is first identified when Fleet pulls the package from EPR? The "availability" date of a rule update or new rule, if not represented by the metadata, is only determined based on when package is released.
  3. If we go down the route of "availability" date, then this date will be redundant across any rule that is new or updated. Is this what we are attempting to achieve.
jpdjere commented 7 months ago

@terrancedejesus

Are we no longer wanting a date that represents when the rule was created?

No. As long as the update date matches the creation date when the rule is created, we don't need a separate field for the creation date, as we never need to show the creation date specifically to the user. What I mean is:

// Rule is created
creation_date: 2024-05-21
update_date: 2024-05-21

// Rule is updated the first time
creation_date: 2024-05-21
update_date: 2024-08-12

// Rule is updated the second time
creation_date: 2024-05-21
update_date: 2024-12-26

... and so on

If that's the behaviour of the update_date, then we only need that update_date.

If we are targeting an "availability" date, is this logic not possible by Kibana when the rule update is first identified when Fleet pulls the package from EPR? The "availability" date of a rule update or new rule, if not represented by the metadata, is only determined based on when package is released.

Prebuilt rule assets are installed to the Kibana kibana_security_solution index by Fleet's API; we don't have any additional logic for this installation that we could modify to track when a new rule is first identified. The prebuilt rule assets documents that are indexed into that index by Fleet do have a updated_at and created_at field, but both of them are always the date in which the latest installation or update happened, for all versions of a rule. This is why we need the elastic_update_date to be part of the rule's data.

If we go down the route of "availability" date, then this date will be redundant across any rule that is new or updated. Is this what we are attempting to achieve.

Not sure I completely understand this point, but I think my previous answer addresses this, we still need that update_date or availability_data as parte of the prebuilt rule asset data.

Mikaayenson commented 7 months ago

If we go down the route of "availability" date, then this date will be redundant across any rule that is new or updated. Is this what we are attempting to achieve.

Not sure I completely understand this point, but I think my previous answer addresses this, we still need that update_date or availability_data as parte of the prebuilt rule asset data.

The main point is that every rule object updated will have the same elastic_update_date since we will be using the date the package was built. If the package has 10 rules updated, all ten rules will have the same date.

It's not necessarily a problem, just a note of redundant-looking information across all updated rules. Now if we have another package later with another different 10 rules updates, of course the date will be different from the first 10.

jpdjere commented 7 months ago

Thanks for the explanation.

Now if we have another package later with another different 10 rules updates, of course the date will be different from the first 10.

This is good enough for us. We don't strictly need different dates within one package release - within that 14 day window. As long as we can distinguish between updates coming from different packages releases we are fine.

terrancedejesus commented 7 months ago

@jpdjere - Thank you for the deets.

I have began adding this into the rule asset creation. One thing I noticed is that when we add historical rules to the package, these will not include an elastic_update_date as these assets are pulled from EPR and are before this addition. Thoughts on this?

Also, below is an example of the rule asset with elastic_update_date, does this work? FYI, we are pulling this date from the rule metadata updated_date - just want to confirm this is fine.

Also we added elastic_updated_date to the root of the rule asset. The other keys here are id and type. - Want to confirm this is fine as well.

Example rule asset ``` { "attributes": { "author": [ "Elastic" ], "description": "Elastic Endgame detected Malware. Click the Elastic Endgame icon in the event.module column or the link in the rule.reference column for additional information.", "from": "now-15m", "index": [ "endgame-*" ], "interval": "10m", "language": "kuery", "license": "Elastic License v2", "max_signals": 10000, "name": "Malware - Detected - Elastic Endgame", "query": "event.kind:alert and event.module:endgame and endgame.metadata.type:detection and (event.action:file_classification_event or endgame.event_subtype_full:file_classification_event)\n", "required_fields": [ { "ecs": false, "name": "endgame.event_subtype_full", "type": "unknown" }, { "ecs": false, "name": "endgame.metadata.type", "type": "unknown" }, { "ecs": true, "name": "event.action", "type": "keyword" }, { "ecs": true, "name": "event.kind", "type": "keyword" }, { "ecs": true, "name": "event.module", "type": "keyword" } ], "risk_score": 99, "rule_id": "0a97b20f-4144-49ea-be32-b540ecc445de", "severity": "critical", "tags": [ "Data Source: Elastic Endgame" ], "type": "query", "version": 101 }, "elastic_update_date": "2023-06-22T00:00:00", "id": "0a97b20f-4144-49ea-be32-b540ecc445de_101", "type": "security-rule" } ```
jpdjere commented 7 months ago

@terrancedejesus Sorry for the delay in replying.

I have began adding this into the rule asset creation. One thing I noticed is that when we add historical rules to the package, these will not include an elastic_update_date as these assets are pulled from EPR and are before this addition. Thoughts on this?

Yes, we understand this will be the case. We will add elastic_update_date as an optional field within our Prebuilt rule asset schema and our internal rule schema to accommodate for the fact that some rule will have this info and others won't.

Also, below is an example of the rule asset with elastic_update_date, does this work? FYI, we are pulling this date from the rule metadata updated_date - just want to confirm this is fine.

elastic_update_date pulled from updated_date works for us ๐Ÿ‘

Also we added elastic_updated_date to the root of the rule asset. The other keys here are id and type. - Want to confirm this is fine as well.

We strongly prefer to have the elastic_updated_date within the attributes field. Our Prebuilt Rule Assets Client pulls only data living within this subfield (renamed security-rule when the prebuilt rule is installed as a prebuilt rule asset in Elasticsearch), and discards the id and type - and all other data in the "root" level.

Would this be fine by you as well? Or would it have side-effects in hashing, etc? Sorry for the confusion in the discussion above, where we talked about "root-level".

terrancedejesus commented 7 months ago

@jpdjere - Thanks for the reply!

We strongly prefer to have the elastic_updated_date within the attributes field. Our Prebuilt Rule Assets Client pulls only data living within this subfield (renamed security-rule when the prebuilt rule is installed as a prebuilt rule asset in Elasticsearch), and discards the id and type - and all other data in the "root" level.

No problem with us, easy to adjust and thank you for clarification.

Here would be an updated rule asset, does this work?

Example ``` { "attributes": { "author": [ "Elastic" ], "description": "Elastic Endgame detected Malware. Click the Elastic Endgame icon in the event.module column or the link in the rule.reference column for additional information.", "elastic_update_date": "2023-06-22T00:00:00", "from": "now-15m", "index": [ "endgame-*" ], "interval": "10m", "language": "kuery", "license": "Elastic License v2", "max_signals": 10000, "name": "Malware - Detected - Elastic Endgame", "query": "event.kind:alert and event.module:endgame and endgame.metadata.type:detection and (event.action:file_classification_event or endgame.event_subtype_full:file_classification_event)\n", "required_fields": [ { "ecs": false, "name": "endgame.event_subtype_full", "type": "unknown" }, { "ecs": false, "name": "endgame.metadata.type", "type": "unknown" }, { "ecs": true, "name": "event.action", "type": "keyword" }, { "ecs": true, "name": "event.kind", "type": "keyword" }, { "ecs": true, "name": "event.module", "type": "keyword" } ], "risk_score": 99, "rule_id": "0a97b20f-4144-49ea-be32-b540ecc445de", "severity": "critical", "tags": [ "Data Source: Elastic Endgame" ], "type": "query", "version": 101 }, "id": "0a97b20f-4144-49ea-be32-b540ecc445de_101", "type": "security-rule" } ```
jpdjere commented 7 months ago

@terrancedejesus

Great, thanks a lot! ๐Ÿ‘

Yes, that looks good. Just a nit - wanted to make sure that the date format is ISO 8601 with the UTC format; the example above is missing the miliseconds and the Z at the end: 2019-11-14T00:55:31.820Z. That's how the date are currently formatted for the created_at and updated_at properties in the security-rule assets:

image

terrancedejesus commented 7 months ago

@jpdjere - Thanks for responding.

the example above is missing the miliseconds and the Z at the end

The milliseconds and Z I can add these. Note that our updated_date in the rule metadata is not ISO-8601 formatted, so no time is captured and it will remain as 00 for these.

Since we are adding elastic_update_date to the attributes, is this now a required rule field that was added to the rule schema upstream? We ask because at the moment, our PoC for this loads the TOML file to JSON, then loads it as an object through our rule schema, which is how we validate the rule is valid. Only after this, do we add the elastic_update_date field when the rule is then converted to a rule asset to avoid version control, backports, breaking changes, etc. - Do we know if a rule is exported this field is exported as well in the rule?

From your image, it looks like the dates are separate from the actual security rule, therefore we are simply providing a way for you to retrieve this date when assets are shipped?

@brokensound77 - Am I missing the point here or questions we discussed?

jpdjere commented 7 months ago

The milliseconds and Z I can add these. Note that our updated_date in the rule metadata is not ISO-8601 formatted, so no time is captured and it will remain as 00 for these.

That's OK, the information about year, month and day is enough for us. So dates that look like 2019-11-14T00:00:00.000Z are OK.

Since we are adding elastic_update_date to the attributes, is this now a required rule field that was added to the rule schema upstream?

We will be adding the elastic_update_date as an optional field within the optional prebuilt field for our rule schema. This will be part of the Prebuilt Rule Customization Epic - Milestone 3 we discussed in yesterday's meeting.

Do we know if a rule is exported this field is exported as well in the rule?

Yes, it will, as part of the prebuilt object field.

From your image, it looks like the dates are separate from the actual security rule, therefore we are simply providing a way for you to retrieve this date when assets are shipped?

Those dates you see in the image above not the update_at and created_at dates from the metadata in the detection-rules package. They are dates that are added to the Elasticsearch savedobjects when the Fleet API is called to install the security_detection_engine package with the prebuilt rules, and are always set to the current date when the API is called, which is not useful information for us.

That's why the elastic_update_date should be part of the rule attributes themselves.

terrancedejesus commented 7 months ago

@jpdjere - Thank you for providing additional insights.

We will be adding the elastic_update_date as an optional field within the optional prebuilt field for our rule schema. This will be part of the Prebuilt Rule Customization Epic - Milestone 3 we discussed in yesterday's meeting.

With this being said, if it is part of the rule schema, required or not, it is a breaking change for us because of backporting. We will need to change our approach and add this to our rule schema, rather than dynamically populate and push into the rule asset.

@brokensound77 - With this field being optional, I think it would be best to be a build time field, determined from rule metadata that we can only build for the compatible semantic version of the stack the feature is being added to. Regarding backporting, this will cause ALL of our rules to receive version bumps, for each release package. We have done this before, so I can get started on our strategy to implement this. Before I do, any additional thoughts?

banderror commented 6 months ago

Hey @jpdjere @terrancedejesus @Mikaayenson ๐Ÿ‘‹

So there were lots of comments in this thread, and I'd like to double-check that after all these comments we're on the same page. Let me try to reiterate on our agreements and please correct me or add anything.

We're going to add a new optional field elastic_update_date to security-rule assets we ship via the package. Here's an example of this field for the Linux Restricted Shell Breakout via Linux Binary(s) prebuilt rule:

The latest 111 version of this rule looks like this in the package:

{
  "type": "security-rule",
  "id": "52376a86-ee86-4967-97ae-1a05f55816f0",
  "attributes": {
    "rule_id": "52376a86-ee86-4967-97ae-1a05f55816f0",
    "name": "Linux Restricted Shell Breakout via Linux Binary(s)",
    "description": "Identifies the abuse of a Linux binary to break out of a restricted shell or environment by spawning an interactive system shell. The activity of spawning a shell from a binary is not common behavior for a user or system administrator, and may indicate an attempt to evade detection, increase capabilities or enhance the stability of an adversary.",
    "type": "eql",
    "language": "eql",
    "index": ["logs-endpoint.events.*"],
    // other rule fields...
    "version": 111
  }
}

The next 112 version should look like that:

{
  "type": "security-rule",
  "id": "52376a86-ee86-4967-97ae-1a05f55816f0",
  "attributes": {
    "rule_id": "52376a86-ee86-4967-97ae-1a05f55816f0",
    "name": "Linux Restricted Shell Breakout via Linux Binary(s)",
    "description": "Identifies the abuse of a Linux binary to break out of a restricted shell or environment by spawning an interactive system shell. The activity of spawning a shell from a binary is not common behavior for a user or system administrator, and may indicate an attempt to evade detection, increase capabilities or enhance the stability of an adversary.",
    "type": "eql",
    "language": "eql",
    "index": ["logs-endpoint.events.*"],
    // other rule fields...
    "version": 112,
    "elastic_update_date": "2024-01-29T00:00:00.000Z"
  }
}

This field will be optional in our rule asset schema in Kibana. The field should be specified for all latest versions of all rules in the next version of the package for Kibana 8.13. The field can be omitted for all existing historical (previous) versions of rules as of today, but should be specified for all historical rule versions created after today in the future. For example, for the Linux Restricted Shell Breakout via Linux Binary(s) rule above, all rule versions >= 112 should include the elastic_update_date field.

The field's value must be formatted in the standard ISO format. Time of the day is not required and can be set to T00:00:00.000Z. We don't have strong requirements for the accuracy of the date itself. It can be the date of file modification by a rule author, the date of PR merge, or the date of building the package. The only requirement is that the values must be monotonically increasing and give a rough understanding to the user when Elastic shipped an update to the rule. +/- a few days would be sufficient accuracy for us.

The new field must not be backported to any packages compatible with Kibana 8.11.x and below. It can be backported to packages that are only compatible with Kibana 8.12.0 and above because starting from 8.12.0 we have forward compatibility of the rule asset schema in Kibana: https://github.com/elastic/security-team/issues/6888. This means that in 8.12.x Kibana versions the elastic_update_date, if specified in the package, will be ignored/omitted until we add support for it. In Kibana versions 8.11.x and below the elastic_update_date, if specified in the package, will lead to an error during prebuilt rule installation or upgrade.

banderror commented 6 months ago

Hey @terrancedejesus @Mikaayenson, last ask from our side: let's please change the name of the field to source_updated_at to make it a little bit more future-proof.

After chatting with @jpdjere we figured we want the name to be resilient to hypothetical future capabilities in Kibana, such as user- or community-created packages with security-rule assets distributed via private/user EPRs or the centralized EPR of Elastic if we ever have support for community-created content.

banderror commented 6 months ago

Tickets for the Rule Management team:

terrancedejesus commented 6 months ago

Alright so I did a bit of digging.

This gets me to the real problem and that is how we backport and version lock. As I attempt to showcase in the image below whenever we have a new field that is applied to all rules, optional or not, our versioning strategy does not do a good job of supporting this because the version is checked per backport branch where the SHA256 hashes are calculated. If these are different, then the version bumps +1. The important part to understand is that, in this example, in 8.11 a rule will not have elastic_update_date dynamically generated with version X. The version lock workflow will then checkout 8.12 and do the same workflow, but now the rule will have elastic_update_date and the SHA256 will change, bumping the rule version. The next time we lock versions it will bump twice as the state of the rule will always be different within (8.3-8.11) vs (8.12+).

version_lock_image

The only option at this time would be to min-stack ALL rules to 8.12 so any updates, tunings, new rules would only go back to 8.12 stacks which is out of sync with our current supported stacks current-3, therefore this is a breaking change as @brokensound77 has stated. While we have introduced breaking changes before regarding this, it seems like a lot of breaking for a timestamp we can supply in metadata when shipping the rule asset to avoid breaking our backporting and versioning.

terrancedejesus commented 6 months ago

@Mikaayenson DED has an epic or meta somewhere for refactoring Detection Rules. May be worth exploring the schema for version lock file(s) in Detection Rules. I believe there is some resilience that can be added with a couple of options:

Remember that when we build a package per stack version, we build it from that branch specifically so we could align that with its own state of the rules for that branch somehow.

 "4d4c35f4-414e-4d0c-bb7e-6db7c80a6957": {
    "8.12" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.11" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.10" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.9" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.8" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.7" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
    },
    "8.6" : {
      "min_stack_version": "8.3",
      "rule_name": "Kernel Load or Unload via Kexec Detected",
      "sha256": "53f533ffdd9d2d9f7c1a5cba374de00d7db74d814cde9706d3750390086f3c78",
      "type": "eql",
      "version": 5
Mikaayenson commented 6 months ago

@terrancedejesus I'm thinking about how to simplify this for you. Can we do this:

  1. Add the field to our schema just to support being able to import Kibana-exported ndjson that contains the new field
  2. DON'T create the field as a build time field. Just generate the field and date dynamically ONLY for publishing. In short call self._convert_add_elastic_last_update_date(obj) only for releasing.
terrancedejesus commented 6 months ago

After discussion with @Mikaayenson...there were a couple options we wanted to explore to hopefully get this in on our end to not be a blocker for @banderror 's team.

The final proposal, as shown in the pull request, is to do the following and address each concern:

We need to align our schemas with upstream

We do not want rule authors duplicating the update date that already exists in rule metadata

We need to consider versioning as this is likely a breaking change requiring min-stack updates to 8.12 across all rules

We need to remain consistent across our code and not introduce anything new that has to be managed

NOTE I want to emphasize that we should not always revert to adding new build time fields here. For instance, related_integrations and required_fields have implications upstream that we cannot control and need to include these in versioning for potential breaking changes. Thus while it is an option, it does not suggest it is a go to solution moving forward.

We need to ensure unit tests exist or are adjusted to accommodate our changes

banderror commented 6 months ago

@terrancedejesus @Mikaayenson Copying this from slack:

We havenโ€™t worked on adding support for source_updated_at on our side yet. Moreover, I guess itโ€™s still unclear if we want to have a top-level field source_updated_at or an object with a field source.updated_at to make it future-proof (potentially, for DaC). @jpdjere should include a proposal for this fieldโ€™s schema into the RFC, the goal is to complete it by the end of this week.

I guess it's not a big difference so this shouldn't block you from working on some implementation of this field, but please hold off merging anything until we approve the proposal on our side and get an approval from your side.