[Security Solution] `upgrade/_review` blocks main thread

xcrzx commented 6 hours ago

Summary

With certain rule upgrade payloads, the internal/detection_engine/prebuilt_rules/upgrade/_review endpoint can block the main Node.js thread for an extended period. Locally, I observed a single rule blocking for over 20 seconds:

Steps to Reproduce

It's not entirely clear what specific payload causes this, but I traced the issue to the note field in a particular rule. Specifically, the issue appears during merge version calculations here:

https://github.com/elastic/kibana/blob/12dc8c1bb84091d7c404b0e29654d56113fc5acb/x-pack/plugins/security_solution/server/lib/detection_engine/prebuilt_rules/logic/diff/calculation/algorithms/multi_line_string_diff_algorithm.ts#L104-L106

The minimal code needed to reproduce the problem is as follows, but the issue may also occur under other conditions and with different fields.

import { merge } from 'node-diff3';
const currentVersion =
  '## Triage and analysis\n\n### Investigating Potential Persistence Through init.d Detected\n\nThe `/etc/init.d` directory is used in Linux systems to store the initialization scripts for various services and daemons that are executed during system startup and shutdown.\n\nAttackers can abuse files within the `/etc/init.d/` directory to run scripts, commands or malicious software every time a system is rebooted by converting an executable file into a service file through the `systemd-sysv-generator`. After conversion, a unit file is created within the `/run/systemd/generator.late/` directory.\n\nThis rule looks for the creation of new files within the `/etc/init.d/` directory. Executable files in these directories will automatically run at boot with root privileges.\n\n> **Note**:\n> This investigation guide uses the [Osquery Markdown Plugin](https://www.elastic.co/guide/en/security/master/invest-guide-run-osquery.html) introduced in Elastic Stack version 8.5.0. Older Elastic Stack versions will display unrendered Markdown in this guide.\n> This investigation guide uses [placeholder fields](https://www.elastic.co/guide/en/security/current/osquery-placeholder-fields.html) to dynamically pass alert data into Osquery queries. Placeholder fields were introduced in Elastic Stack version 8.7.0. If you\'re using Elastic Stack version 8.6.0 or earlier, you\'ll need to manually adjust this investigation guide\'s queries to ensure they properly run.\n#### Possible Investigation Steps\n\n- Investigate the file that was created or modified.\n  - !{osquery{"label":"Osquery - Retrieve File Information","query":"SELECT * FROM file WHERE path = {{file.path}}"}}\n- Investigate whether any other files in the `/etc/init.d/` or `/run/systemd/generator.late/` directories have been altered.\n  - !{osquery{"label":"Osquery - Retrieve File Listing Information","query":"SELECT * FROM file WHERE (path LIKE \'/etc/init.d/%\' OR path LIKE \'/run/systemd/generator.late/%\')"}}\n  - !{osquery{"label":"Osquery - Retrieve Additional File Listing Information","query":"SELECT f.path, u.username AS file_owner, g.groupname AS group_owner, datetime(f.atime, \'unixepoch\') AS\\nfile_last_access_time, datetime(f.mtime, \'unixepoch\') AS file_last_modified_time, datetime(f.ctime, \'unixepoch\') AS\\nfile_last_status_change_time, datetime(f.btime, \'unixepoch\') AS file_created_time, f.size AS size_bytes FROM file f LEFT\\nJOIN users u ON f.uid = u.uid LEFT JOIN groups g ON f.gid = g.gid WHERE (path LIKE \'/etc/init.d/%\' OR path LIKE\\n\'/run/systemd/generator.late/%\')\\n"}}\n- Investigate the script execution chain (parent process tree) for unknown processes. Examine their executable files for prevalence and whether they are located in expected locations.\n  - !{osquery{"label":"Osquery - Retrieve Running Processes by User","query":"SELECT pid, username, name FROM processes p JOIN users u ON u.uid = p.uid ORDER BY username"}}\n- Investigate syslog through the `sudo cat /var/log/syslog | grep \'LSB\'` command to find traces of the LSB header of the script (if present). If syslog is being ingested into Elasticsearch, the same can be accomplished through Kibana.\n- Investigate other alerts associated with the user/host during the past 48 hours.\n- Validate whether this activity is related to planned patches, updates, network administrator activity, or legitimate software installations.\n- Investigate whether the altered scripts call other malicious scripts elsewhere on the file system. \n  - If scripts or executables were dropped, retrieve the files and determine if they are malicious:\n    - Use a private sandboxed malware analysis system to perform analysis.\n      - Observe and collect information about the following activities:\n        - Attempts to contact external domains and addresses.\n          - Check if the domain is newly registered or unexpected.\n          - Check the reputation of the domain or IP address.\n        - File access, modification, and creation activities.\n        - Cron jobs, services and other persistence mechanisms.\n            - !{osquery{"label":"Osquery - Retrieve Crontab Information","query":"SELECT * FROM crontab"}}\n\n### False Positive Analysis\n\n- If this activity is related to new benign software installation activity, consider adding exceptions — preferably with a combination of user and command line conditions.\n- If this activity is related to a system administrator who uses init.d for administrative purposes, consider adding exceptions for this specific administrator user account. \n- Try to understand the context of the execution by thinking about the user, machine, or business purpose. A small number of endpoints, such as servers with unique software, might appear unusual but satisfy a specific business need.\n\n### Related Rules\n\n- Suspicious File Creation in /etc for Persistence - 1c84dd64-7e6c-4bad-ac73-a5014ee37042\n\n### Response and remediation\n\n- Initiate the incident response process based on the outcome of the triage.\n- Isolate the involved host to prevent further post-compromise behavior.\n- If the triage identified malware, search the environment for additional compromised hosts.\n  - Implement temporary network rules, procedures, and segmentation to contain the malware.\n  - Stop suspicious processes.\n  - Immediately block the identified indicators of compromise (IoCs).\n  - Inspect the affected systems for additional malware backdoors like reverse shells, reverse proxies, or droppers that attackers could use to reinfect the system.\n- Remove and block malicious artifacts identified during triage.\n- Investigate credential exposure on systems compromised or used by the attacker to ensure all compromised accounts are identified. Reset passwords for these accounts and other potentially compromised credentials, such as email, business systems, and web services.\n- Delete the maliciously created service/init.d files or restore it to the original configuration.\n- Run a full antimalware scan. This may reveal additional artifacts left in the system, persistence mechanisms, and malware components.\n- Determine the initial vector abused by the attacker and take action to prevent reinfection through the same vector.\n- Leverage the incident response data and logging to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).\n';
const baseVersion =
  '## Triage and analysis\n\n### Investigating Potential Persistence Through init.d Detected\n\nThe `/etc/init.d` directory is used in Linux systems to store the initialization scripts for various services and daemons that are executed during system startup and shutdown.\n\nAttackers can abuse files within the `/etc/init.d/` directory to run scripts, commands or malicious software every time a system is rebooted by converting an executable file into a service file through the `systemd-sysv-generator`. After conversion, a unit file is created within the `/run/systemd/generator.late/` directory.\n\nThis rule looks for the creation of new files within the `/etc/init.d/` directory. Executable files in these directories will automatically run at boot with root privileges.\n\n> **Note**:\n> This investigation guide uses the [Osquery Markdown Plugin](https://www.elastic.co/guide/en/security/master/invest-guide-run-osquery.html) introduced in Elastic Stack version 8.5.0. Older Elastic Stack versions will display unrendered Markdown in this guide.\n> This investigation guide uses [placeholder fields](https://www.elastic.co/guide/en/security/current/osquery-placeholder-fields.html) to dynamically pass alert data into Osquery queries. Placeholder fields were introduced in Elastic Stack version 8.7.0. If you\'re using Elastic Stack version 8.6.0 or earlier, you\'ll need to manually adjust this investigation guide\'s queries to ensure they properly run.\n#### Possible Investigation Steps\n\n- Investigate the file that was created or modified.\n  - !{osquery{"label":"Osquery - Retrieve File Information","query":"SELECT * FROM file WHERE path = {{file.path}}"}}\n- Investigate whether any other files in the `/etc/init.d/` or `/run/systemd/generator.late/` directories have been altered.\n  - !{osquery{"label":"Osquery - Retrieve File Listing Information","query":"SELECT * FROM file WHERE (path LIKE \'/etc/init.d/%\' OR path LIKE \'/run/systemd/generator.late/%\')"}}\n  - !{osquery{"label":"Osquery - Retrieve Additional File Listing Information","query":"SELECT\\n  f.path,\\n  u.username AS file_owner,\\n  g.groupname AS group_owner,\\n  datetime(f.atime, \'unixepoch\') AS file_last_access_time,\\n  datetime(f.mtime, \'unixepoch\') AS file_last_modified_time,\\n  datetime(f.ctime, \'unixepoch\') AS file_last_status_change_time,\\n  datetime(f.btime, \'unixepoch\') AS file_created_time,\\n  f.size AS size_bytes\\nFROM\\n  file f\\n  LEFT JOIN users u ON f.uid = u.uid\\n  LEFT JOIN groups g ON f.gid = g.gid\\nWHERE (path LIKE \'/etc/init.d/%\' OR path LIKE \'/run/systemd/generator.late/%\')\\n"}}\n- Investigate the script execution chain (parent process tree) for unknown processes. Examine their executable files for prevalence and whether they are located in expected locations.\n  - !{osquery{"label":"Osquery - Retrieve Running Processes by User","query":"SELECT pid, username, name FROM processes p JOIN users u ON u.uid = p.uid ORDER BY username"}}\n- Investigate syslog through the `sudo cat /var/log/syslog | grep \'LSB\'` command to find traces of the LSB header of the script (if present). If syslog is being ingested into Elasticsearch, the same can be accomplished through Kibana.\n- Investigate other alerts associated with the user/host during the past 48 hours.\n- Validate whether this activity is related to planned patches, updates, network administrator activity, or legitimate software installations.\n- Investigate whether the altered scripts call other malicious scripts elsewhere on the file system. \n  - If scripts or executables were dropped, retrieve the files and determine if they are malicious:\n    - Use a private sandboxed malware analysis system to perform analysis.\n      - Observe and collect information about the following activities:\n        - Attempts to contact external domains and addresses.\n          - Check if the domain is newly registered or unexpected.\n          - Check the reputation of the domain or IP address.\n        - File access, modification, and creation activities.\n        - Cron jobs, services and other persistence mechanisms.\n            - !{osquery{"label":"Osquery - Retrieve Crontab Information","query":"SELECT * FROM crontab"}}\n\n### False Positive Analysis\n\n- If this activity is related to new benign software installation activity, consider adding exceptions — preferably with a combination of user and command line conditions.\n- If this activity is related to a system administrator who uses init.d for administrative purposes, consider adding exceptions for this specific administrator user account. \n- Try to understand the context of the execution by thinking about the user, machine, or business purpose. A small number of endpoints, such as servers with unique software, might appear unusual but satisfy a specific business need.\n\n### Related Rules\n\n- Suspicious File Creation in /etc for Persistence - 1c84dd64-7e6c-4bad-ac73-a5014ee37042\n\n### Response and remediation\n\n- Initiate the incident response process based on the outcome of the triage.\n- Isolate the involved host to prevent further post-compromise behavior.\n- If the triage identified malware, search the environment for additional compromised hosts.\n  - Implement temporary network rules, procedures, and segmentation to contain the malware.\n  - Stop suspicious processes.\n  - Immediately block the identified indicators of compromise (IoCs).\n  - Inspect the affected systems for additional malware backdoors like reverse shells, reverse proxies, or droppers that attackers could use to reinfect the system.\n- Remove and block malicious artifacts identified during triage.\n- Investigate credential exposure on systems compromised or used by the attacker to ensure all compromised accounts are identified. Reset passwords for these accounts and other potentially compromised credentials, such as email, business systems, and web services.\n- Delete the maliciously created service/init.d files or restore it to the original configuration.\n- Run a full antimalware scan. This may reveal additional artifacts left in the system, persistence mechanisms, and malware components.\n- Determine the initial vector abused by the attacker and take action to prevent reinfection through the same vector.\n- Leverage the incident response data and logging to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).\n';
const targetVersion =
  '## Triage and analysis\n\n### Investigating System V Init Script Created\n\nThe `/etc/init.d` directory is used in Linux systems to store the initialization scripts for various services and daemons that are executed during system startup and shutdown.\n\nAttackers can abuse files within the `/etc/init.d/` directory to run scripts, commands or malicious software every time a system is rebooted by converting an executable file into a service file through the `systemd-sysv-generator`. After conversion, a unit file is created within the `/run/systemd/generator.late/` directory.\n\nThis rule looks for the creation of new files within the `/etc/init.d/` directory. Executable files in these directories will automatically run at boot with root privileges.\n\n> **Note**:\n> This investigation guide uses the [Osquery Markdown Plugin](https://www.elastic.co/guide/en/security/master/invest-guide-run-osquery.html) introduced in Elastic Stack version 8.5.0. Older Elastic Stack versions will display unrendered Markdown in this guide.\n> This investigation guide uses [placeholder fields](https://www.elastic.co/guide/en/security/current/osquery-placeholder-fields.html) to dynamically pass alert data into Osquery queries. Placeholder fields were introduced in Elastic Stack version 8.7.0. If you\'re using Elastic Stack version 8.6.0 or earlier, you\'ll need to manually adjust this investigation guide\'s queries to ensure they properly run.\n#### Possible Investigation Steps\n\n- Investigate the file that was created or modified.\n  - !{osquery{"label":"Osquery - Retrieve File Information","query":"SELECT * FROM file WHERE path = {{file.path}}"}}\n- Investigate whether any other files in the `/etc/init.d/` or `/run/systemd/generator.late/` directories have been altered.\n  - !{osquery{"label":"Osquery - Retrieve File Listing Information","query":"SELECT * FROM file WHERE path LIKE \'/etc/init.d/%\'"}}\n  - !{osquery{"label":"Osquery - Retrieve Additional File Listing Information","query":"SELECT f.path, u.username AS file_owner, g.groupname AS group_owner, datetime(f.atime, \'unixepoch\') AS\\nfile_last_access_time, datetime(f.mtime, \'unixepoch\') AS file_last_modified_time, datetime(f.ctime, \'unixepoch\') AS\\nfile_last_status_change_time, datetime(f.btime, \'unixepoch\') AS file_created_time, f.size AS size_bytes FROM file f LEFT\\nJOIN users u ON f.uid = u.uid LEFT JOIN groups g ON f.gid = g.gid WHERE path LIKE \'/etc/init.d/%\'\\n"}}\n- Investigate the script execution chain (parent process tree) for unknown processes. Examine their executable files for prevalence and whether they are located in expected locations.\n  - !{osquery{"label":"Osquery - Retrieve Running Processes by User","query":"SELECT pid, username, name FROM processes p JOIN users u ON u.uid = p.uid ORDER BY username"}}\n- Investigate syslog through the `sudo cat /var/log/syslog | grep \'LSB\'` command to find traces of the LSB header of the script (if present). If syslog is being ingested into Elasticsearch, the same can be accomplished through Kibana.\n- Investigate other alerts associated with the user/host during the past 48 hours.\n- Validate whether this activity is related to planned patches, updates, network administrator activity, or legitimate software installations.\n- Investigate whether the altered scripts call other malicious scripts elsewhere on the file system. \n  - If scripts or executables were dropped, retrieve the files and determine if they are malicious:\n    - Use a private sandboxed malware analysis system to perform analysis.\n      - Observe and collect information about the following activities:\n        - Attempts to contact external domains and addresses.\n          - Check if the domain is newly registered or unexpected.\n          - Check the reputation of the domain or IP address.\n        - File access, modification, and creation activities.\n        - Cron jobs, services and other persistence mechanisms.\n            - !{osquery{"label":"Osquery - Retrieve Crontab Information","query":"SELECT * FROM crontab"}}\n\n### False Positive Analysis\n\n- If this activity is related to new benign software installation activity, consider adding exceptions — preferably with a combination of user and command line conditions.\n- If this activity is related to a system administrator who uses init.d for administrative purposes, consider adding exceptions for this specific administrator user account. \n- Try to understand the context of the execution by thinking about the user, machine, or business purpose. A small number of endpoints, such as servers with unique software, might appear unusual but satisfy a specific business need.\n\n### Related Rules\n\n- Suspicious File Creation in /etc for Persistence - 1c84dd64-7e6c-4bad-ac73-a5014ee37042\n\n### Response and remediation\n\n- Initiate the incident response process based on the outcome of the triage.\n- Isolate the involved host to prevent further post-compromise behavior.\n- If the triage identified malware, search the environment for additional compromised hosts.\n  - Implement temporary network rules, procedures, and segmentation to contain the malware.\n  - Stop suspicious processes.\n  - Immediately block the identified indicators of compromise (IoCs).\n  - Inspect the affected systems for additional malware backdoors like reverse shells, reverse proxies, or droppers that attackers could use to reinfect the system.\n- Remove and block malicious artifacts identified during triage.\n- Investigate credential exposure on systems compromised or used by the attacker to ensure all compromised accounts are identified. Reset passwords for these accounts and other potentially compromised credentials, such as email, business systems, and web services.\n- Delete the maliciously created service/init.d files or restore it to the original configuration.\n- Run a full antimalware scan. This may reveal additional artifacts left in the system, persistence mechanisms, and malware components.\n- Determine the initial vector abused by the attacker and take action to prevent reinfection through the same vector.\n- Leverage the incident response data and logging to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).\n';

const mergedVersion = merge(currentVersion, baseVersion ?? '', targetVersion, {
  stringSeparator: /(\S+|\s+)/g, // Retains all whitespace, which we keep to preserve formatting
});

Impact

It's unclear how often rule upgrades may lead to main thread blocking, but since the upgrade/_review endpoint might be called relatively frequently, having even a few rules with updates that trigger the issue could result in Kibana being blocked for several minutes. For this reason, I consider this a critical issue for the Rule Customization release.

elasticmachine commented 6 hours ago

Pinging @elastic/security-solution (Team: SecuritySolution)

elasticmachine commented 6 hours ago

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine commented 6 hours ago

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

marshallmain commented 3 hours ago

It looks like the regex is not correct. I started looking at this because (\S+|\s+) initially looked like some regexes that CodeQL flagged recently as performance issues, but in this case the regex itself doesn't seem slow to execute.

/S+ matches any non-whitespace character, so this regex basically matches anything and I suspect the input strings are being split into tons of tiny chunks as a result. I tried replacing the regex with (\s+) and it's much faster and the tests still pass, is there some other reason we'd want \S+ included?

Notes on recent regex issues: https://github.com/elastic/security-team/issues/10699

banderror commented 1 hour ago

/S+ matches any non-whitespace character, so this regex basically matches anything and I suspect the input strings are being split into tons of tiny chunks as a result.

@dplumlee Yep, that's exactly what happens. The merge function outputs arrays containing tons of elements, each one being a single character (whitespace or non-whitespace) or an empty string.

Also, the merge function contains 3 nested loops, so the complexity of it seems to be at least cubic based on the regexp and the sizes of the input strings.

Our goal is to simplify the regexp as much as possible. It should break input strings into lines, not individual characters.

elastic / kibana