Open ghost opened 7 months ago
Pinging @elastic/security-solution (Team: SecuritySolution)
Reviewed & assigned to @MadameSheema
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)
Pinging @elastic/security-detections-response (Team:Detections and Resp)
I was given access to the environment where this bug can be reproduced. There are 4 rules for which the upgrade fails:
Here's a response from the /internal/detection_engine/prebuilt_rules/upgrade/_review
endpoint that is used to populate the Rule Updates table:
When you try to upgrade the "Potential DLL Side-Loading via Microsoft Antimalware Service Executable" rule, you can see a few issues:
First, the upgrade fails and is shown in the toast as a "green", successful result, although it says "failed":
Second, this is an error that the /internal/detection_engine/prebuilt_rules/upgrade/_perform
endpoint returns when you click "Update". Something is wrong with the lastRun
object which is an internal rule field used by the Alerting Framework.
{
"summary": {
"total": 1,
"skipped": 0,
"succeeded": 0,
"failed": 1
},
"results": {
"updated": [],
"skipped": []
},
"errors": [
{
"message": "[attributes.lastRun]: types that failed validation:\n- [attributes.lastRun.0.outcomeMsg]: types that failed validation:\n - [attributes.lastRun.outcomeMsg.0]: could not parse array value from json input\n - [attributes.lastRun.outcomeMsg.1]: expected value to equal [null]\n- [attributes.lastRun.1]: expected value to equal [null]: Bad Request",
"rules": [
{
"rule_id": "053a0387-f3b5-4ba5-8245-8002cca2bd08",
"name": "Potential DLL Side-Loading via Microsoft Antimalware Service Executable"
}
]
}
]
}
Third, we incorrectly pass the rule version and revision in the request body to this endpoint. We pass this:
{"mode":"SPECIFIC_RULES","rules":[{"rule_id":"053a0387-f3b5-4ba5-8245-8002cca2bd08","version":108,"revision":6}],"pick_version":"TARGET"}
but the versions and revisions of the current and target rule versions are these:
Current:
revision: 6
version: 6
Target:
revision: 7
version: 108
I think we should be passing this instead:
{"mode":"SPECIFIC_RULES","rules":[{"rule_id":"053a0387-f3b5-4ba5-8245-8002cca2bd08","version":6,"revision":6}],"pick_version":"TARGET"}
The same happens with the other 3 rules.
Just for the record @banderror:
I think we should be passing this instead:
{"mode":"SPECIFIC_RULES","rules":[{"rule_id":"053a0387-f3b5-4ba5-8245-8002cca2bd08","version":6,"revision":6}],"pick_version":"TARGET"}
The check for the revision in the handler checks that what's passed in equals the current revision.
The version that should be passed is the next version.
So the payload looks correct to me.
@vgomez-el @banderror @karanbirsingh-qasource and @elastic/response-ops team
I'm investigating the issue and have reached some conclusions:
authenticationStart is not registered!
error message.
lastRun.outcomeMsg
property as a string:
"lastRun": {
"outcome": "failed",
"outcomeMsg": "authenticationStart is not registered!",
"warning": "unknown",
"alertsCount": {},
"outcomeOrder": 20
},
However, lastRun.outcomeMsg
is of type string[] | null
.
authenticationStart is not registered!
error message is thrown during Kibana startup, due to some failure in authentication. This apparently happened, in this specific case, during the Kibana upgrade, and cannot be reproduced (at least not easily) in similar upgrade processes.plugin.ts
file of the Alerting Framework.. This error bubbles up from the execution_log and ends up written in the lastRun.outcomeMsg
.lastRun.outcomeMsg
comes out from a decision taken when the Rule Result Service was created. The migration does not apply the change from string
to string[]
; instead, that type transformation takes place in the endpoint: https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/task_runner/fixtures.ts#L118string
to lastRun.outcomeMsg
instead of being converted to string[]
, since this only happens during API requests.Therefore:
lastRun.outcomeMsg
to be written with the wrong type, reaching out to the Response Ops team.TL;DR for ResponseOps team:
lastRun.outcomeMsg
property as a string
, although the property is acutally an array of string
s. This causes rule validation to fail; breaking the affected rule's execution and management capabilities like upgrade (which is what this ticket reported).lastRun.outcomeMsg
-which was a string
- would not be migrated using a normal migration to its new type of string[]
; instead, that type transformation takes place in the endpoint: https://github.com/elastic/kibana/blob/main/x-pack/plugins/alerting/server/task_runner/fixtures.ts#L118string
to lastRun.outcomeMsg
instead of being converted to string[]
, since this only happens during API requests.I would like to know how we can move forward to fixing this issue.
lastRun.outcomeMsg
that we didn't do back then?lastRun.outcomeMsg
?My guess is that the string (vs string[]) version of the field is being populated here, but just a quick guess based on searching the code and finding this in migrations ...
Seems like we can just fix that, right?
Oh, that code is already in a migration. So, we'd need another migration to fix it?
Hi @pmuellr thanks for taking a look at this.
Yes, doing another migration to fix this was my initial thought, but my understanding was that the migration mechanism that ResponseOps maintained was deprecated. Indeed, I see that the last migration file that was created was for 8.8.
How do you currently carry out migrations? Is there a replacing mechanism that I can use to fix this issue?
Our migration story got complicated with serverless, but I think it was mainly about additions / removals of fields, vs just wacking some data, like what I think we need to do here. Also, I'm not sure we've done a migration since 8.8, since we were asked to not make migration changes while they were working on changes to encrypted saved objects.
Let me ask the team about this ...
Since we don't have a working migration mechanism for rules that could be used for fixing the broken data that causes this bug, but there's a simple workaround:
we will postpone fixing this bug until after 8.17 to be able to ship https://github.com/elastic/kibana/issues/174168 earlier.
cc @jpdjere @approksiu
Epics: https://github.com/elastic/security-team/issues/1974 (internal), https://github.com/elastic/kibana/issues/174168
Summary
Describe the bug: Rule Update failure on 8.13 from 7.17.18
Kibana / Elastic search Stack version Version: 7.17.18 to 8.13.0 BC2
Browser and Browser OS Version: Chrome for macOS Version 122.0.6261.94 (Official Build) (x86_64)
Functional Area: Rule Update
precondition
Steps to reproduce
Additional Result
Current Result
Expected Result
Screen-Cast:
https://github.com/elastic/kibana/assets/59917825/6eadb902-b20e-4fa0-b7c3-0e27ee521c8e