Closed everping closed 2 years ago
Hi @everping, is the software table still not showing up?
@RachelElysia the software tab is shown up now but the vulnerable software still didn't appear. So I think the Vuln Processing does not work
@everping So sorry to hear that. Can you try running fleetctl get config --include-server-config
to view the vulnerabilities settings?
By default, vulnerability processing runs every hour and requires the software inventory to be populated from hosts. You can change this to something shorter by setting FLEET_VULNERABILITIES_PERIODICITY=5m
. Also, there may be an issue with a lock not being released that is preventing vulnerability processing from running. You can check this by connecting to the db an running select * from locks where name = 'vulnerabilities'
. The lock will eventually expire after expires_at
.
@RachelElysia I had to set the config address and log in to Fleet API to let fleetctl get config
work. The result is as below
---
apiVersion: v1
kind: config
spec:
agent_options:
config:
decorators:
load:
- SELECT uuid AS host_uuid FROM system_info;
- SELECT hostname AS hostname FROM system_info;
options:
disable_distributed: false
distributed_interval: 10
distributed_plugin: tls
distributed_tls_max_attempts: 3
logger_plugin: tls
logger_tls_endpoint: /api/osquery/log
logger_tls_period: 10
pack_delimiter: /
overrides: {}
fleet_desktop:
transparency_url: https://fleetdm.com/transparency
host_expiry_settings:
host_expiry_enabled: false
host_expiry_window: 0
host_settings:
enable_host_users: true
enable_software_inventory: true
integrations:
jira: null
zendesk: null
license:
device_count: 1
expiration: "2023-07-12T05:06:52Z"
note: Created with Fleet License key dispenser
organization: xxxxx
tier: premium
logging:
debug: false
json: false
result:
config:
enable_log_compression: false
enable_log_rotation: false
result_log_file: /tmp/osquery_result
status_log_file: /tmp/osquery_status
plugin: filesystem
status:
config:
enable_log_compression: false
enable_log_rotation: false
result_log_file: /tmp/osquery_result
status_log_file: /tmp/osquery_status
plugin: filesystem
org_info:
org_logo_url: xxxxx
org_name: xxxxx
server_settings:
deferred_save_host: false
enable_analytics: true
live_query_disabled: false
server_url: xxxxx
smtp_settings:
authentication_method: authmethod_plain
authentication_type: authtype_username_password
configured: false
domain: ""
enable_smtp: false
enable_ssl_tls: true
enable_start_tls: true
password: ""
port: 587
sender_address: ""
server: ""
user_name: ""
verify_ssl_certs: true
sso_settings:
enable_sso: false
enable_sso_idp_login: false
entity_id: ""
idp_image_url: ""
idp_name: ""
issuer_uri: ""
metadata: ""
metadata_url: ""
update_interval:
osquery_detail: 1h0m0s
osquery_policy: 1h0m0s
vulnerabilities:
cpe_database_url: ""
current_instance_checks: auto
cve_feed_prefix_url: ""
databases_path: /home/fleet/vulndb/
disable_data_sync: false
periodicity: 1h0m0s
recent_vulnerability_max_age: 720h0m0s
vulnerability_settings:
databases_path: ""
webhook_settings:
failing_policies_webhook:
destination_url: ""
enable_failing_policies_webhook: false
host_batch_size: 0
policy_ids: null
host_status_webhook:
days_count: 0
destination_url: ""
enable_host_status_webhook: false
host_percentage: 0
interval: 24h0m0s
vulnerabilities_webhook:
destination_url: ""
enable_vulnerabilities_webhook: false
host_batch_size: 0
@michalnicp There actually has a locks record. How should we deal with this?
You shouldn't need to do anything with the lock. After it expires, fleet should resume vulnerability processing. There can be some issues releasing the lock if the pod dies, but should be resolved after the lock expires on its own.
Do you see any errors in the logs or unusual OOMKilled events in the output from
kubectl describe pod [fleet-pod]
@michalnicp I waited until the locking expired, but it automatically updated the owner and expired time again as you see below. That seems to be a deadlock
I have also checked the pod and no error events happened. But when checking the application logs, I got
level=error ts=2022-07-15T18:58:03.868518704Z component=http method=POST uri=/api/v1/osquery/config took=3.383427ms ip_addr=zzzzz x_for_ip_addr=xxxx err="internal error: fetch base config: load team agent options for host: select team: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '\"$.agent_options\" FROM teams WHERE id = ?' at line 1"
level=error ts=2022-07-15T18:58:07.951171239Z component=http method=POST uri=/api/v1/osquery/config took=2.790603ms ip_addr=zzzzz x_for_ip_addr=zzzzz err="internal error: fetch base config: load team agent options for host: select team: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '\"$.agent_options\" FROM teams WHERE id = ?' at line 1"
Does this cause the locking?
@everping what's your MySQL version? That syntax error about the JSON syntax might mean that your version is incompatible, and that could be causing the issues with vulnerability processing.
The "lock" that you see is indicating that one of the Fleet servers has declared its intent to do the vulnerability processing. That's normal and I don't see any indication of a deadlock.
I suspect the issue with the sql query may be that ANSI_QUOTES
is enabled in MySQL. Can you confirm by running
SELECT @@sql_mode;
I suspect the issue may be caused by not enough memory for vulnerability processing. According to https://fleetdm.com/docs/deploying/reference-architectures, we recommend 4 GB of memory. As noted in your deployment yaml above, 128 MB is not enough. I suspect that the pod is getting OOMKilled by k8s.
@michalnicp
I suspect the issue with the sql query may be that
ANSI_QUOTES
is enabled in MySQL. Can you confirm by runningSELECT @@sql_mode;
Yes, ANSI_QUOTES
is enabled and I'm using MySQL 8. Should I disable it for fleetdm?
I suspect the issue may be caused by not enough memory for vulnerability processing. According to https://fleetdm.com/docs/deploying/reference-architectures, we recommend 4 GB of memory. As noted in your deployment yaml above, 128 MB is not enough. I suspect that the pod is getting OOMKilled by k8s.
Yes, I have checked the pod status and got
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Yes, ANSI_QUOTES is enabled and I'm using MySQL 8. Should I disable it for fleetdm?
This issue has come up a few times. We have generally tried to make fleet work with ANSI_QUOTES
enabled, but sometimes we miss things. I have already opened a pr to fix this one #6707. You should disable it for now as a workaround.
Can you increase the pod memory limit and see if vulnerability processing starts working?
@michalnicp I updated the deployment as below
resources:
limits:
cpu: 1024m
memory: 4096Mi
requests:
cpu: 500m
memory: 128Mi
No OOMKilled
appears but Vulnerable software list is still empty
Do you see any errors in the logs from fleet? Have you waited at least 1 hour? Now that the pod is not getting killed by k8s, we should have a better chance of tracking down the issue.
@michalnicp Until now, the vuln processing still does not work. No vulnerable software appears while the pod is not killed by k8s (its age is over 3 days) The only error I'm getting is the SQL syntax error caused by ANSI quotes
Hi @everping - Do you mind running this and posting back the resutls?
SELECT json_value FROM aggregated_stats WHERE type = 'os_versions';
Thanks
@juan-fdz-hawa I'm attaching the screenshot here
Thanks @everping we found a bug with vulnerability processing for LTS versions of Ubuntu - this should be fixed in the next patch release.
Thanks @juan-fdz-hawa! Opening this issue because the patch has not been released yet.
I'm also adding this issue to the release board so that it's tracked.
Fleet version: 4.17.0
Operating system: Kubernetes 1.21.5
💥  Actual behavior
I deployed the Fleet instance by using K8S with the following deployment:
Theoretically, the vulnerability processing should work, and I can see vulnerabilities in my software. But in fact, the software tab was empty, and the vulnerability DB folder and CVE/CPE database were empty as well below
So my question is, am I missing something in the Fleet deploying for the Vulnerability Processing to work?