Closed hkinjo2 closed 1 month ago
My best bet is that there are issues with downloading and/or loading the kernel driver.
Incidentally we just added more info to the following 2 repos, perhaps useful as well:
Could you overwrite your Falco container entrypoint to something like?
command: ["/bin/sh"]
args:
- -c
- >-
sleep 10000000
and after execing into the pod, launch Falco manually and tell us the error message?
What kernel driver are you using? If you can try --modern-bpf
.
thank you for reporting, could you provide the Falco logs of one of the pods? Could you rewrite the issue in English, please?
English translation.
I upgraded my AKS cluster to version 1.26.1 and Falco stopped working properly with CrashLoopBackOff. I have rebooted, redeployed, deployed a new version of Flaco, changed the assigned cpu, etc., but the problem has not been resolved. We would appreciate any information you can give us on this issue. The current version of Falco is 1.29.1.
Confirmation. Connect with the following command.
kubectl exec -it
-n falco service falco start or systemctl start falco and run this command (journalctl -fu falco) or is the understanding to provide logs output to /var/messages?
I tried to connect to Falco with the previous recognition, but could not confirm it due to the CrashLoopBackOff status as shown in the operation log.
Am I correct in my understanding that the repository should be added before logging?
If so, I apologize. I do not know how to add a repository due to my limited knowledge.
I would appreciate it if you could tell me how to do this as well.
sankou.txt
https://github.com/falcosecurity/falco/issues/2982#issuecomment-1863860540
My best guess is that there is a problem downloading and/or loading the kernel drivers.
BTW, I have added more information to the following two repositories. Perhaps they will be equally helpful.
・https://github.com/falcosecurity/deploy-kubernetes/tree/main/kubernetes ・https://github.com/falcosecurity/cncf-green-review-testing Could you please overwrite the Falco container entry point with something like
@incertum I am sorry. Please let me know the procedure again about the above.
@incertum @Andreagit97
Hello.
I apologize for my lack of knowledge. Please let me know whatever information you need to solve the problem. We will try to get it.
However, there are many operations that I do not know how to acquire and that is my issue.
Please help me. Please let me know how to acquire it along with the necessary information.
As a first attempt, I would try to install Falco with the modern-bpf probe, which is the easiest method we have https://github.com/falcosecurity/charts/tree/master/charts/falco#daemonset. So you have to type:
# to update the helm chart to the latest version
helm repo update falcosecurity --fail-on-repo-update-fail
# to run Falco with modern ebpf
helm install falco falcosecurity/falco \
--set driver.kind=modern-bpf
@Andreagit97
Thank you for letting me know. I changed the driver parameters as instructed and deployed it, but the result was the same. We will share the work records and logs, so please check them.
We look forward to hearing your opinions. 20240115_Falco作業証跡.xlsx logs_69.zip
Looking at your logs:
2024-01-15T10:20:47.7261752Z REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
2024-01-15T10:20:47.7262638Z 13 Fri Jul 22 14:35:26 2022 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7266217Z 14 Thu Jul 27 13:26:31 2023 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7267097Z 15 Fri Aug 4 15:50:11 2023 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7268266Z 16 Fri Aug 4 15:57:54 2023 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7269001Z 17 Fri Aug 4 17:46:56 2023 superseded falco-3.4.1 0.35.1 Upgrade complete
2024-01-15T10:20:47.7270362Z 18 Tue Dec 12 18:27:52 2023 superseded falco-3.4.1 0.35.1 Upgrade complete
2024-01-15T10:20:47.7271007Z 19 Tue Dec 12 18:42:48 2023 superseded falco-3.4.1 0.35.1 Upgrade complete
2024-01-15T10:20:47.7271644Z 20 Thu Dec 14 16:40:31 2023 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7272280Z 21 Wed Dec 20 11:29:21 2023 superseded falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:47.7272964Z 22 Mon Jan 15 19:11:44 2024 deployed falco-1.15.2 0.29.0 Upgrade complete
2024-01-15T10:20:51.7628288Z Release "falco" has been upgraded. Happy Helming!
2024-01-15T10:20:51.7629087Z NAME: falco
2024-01-15T10:20:51.7630661Z LAST DEPLOYED: Mon Jan 15 19:20:49 2024
2024-01-15T10:20:51.7631105Z NAMESPACE: falco
2024-01-15T10:20:51.7631467Z STATUS: deployed
2024-01-15T10:20:51.7631804Z REVISION: 23
2024-01-15T10:20:51.7632196Z TEST SUITE: None
2024-01-15T10:20:51.7632611Z NOTES:
2024-01-15T10:20:51.7633104Z Falco agents are spinning up on each node in your cluster. After a few
2024-01-15T10:20:51.7633820Z seconds, they are going to start monitoring your containers looking for
2024-01-15T10:20:51.7634370Z security issues.
2024-01-15T10:20:51.7634619Z
2024-01-15T10:20:51.7634633Z
2024-01-15T10:20:51.7634867Z No further action should be required.
2024-01-15T10:20:51.7635178Z
2024-01-15T10:20:51.7635189Z
2024-01-15T10:20:51.7635330Z Tip:
2024-01-15T10:20:51.7636551Z You can easily forward Falco events to Slack, Kafka, AWS Lambda and more with falcosidekick.
2024-01-15T10:20:51.7637337Z Full list of outputs: https://github.com/falcosecurity/charts/tree/master/falcosidekick.
2024-01-15T10:20:51.7638085Z You can enable its deployment with `--set falcosidekick.enabled=true` or in your values.yaml.
2024-01-15T10:20:51.7638735Z See: https://github.com/falcosecurity/charts/blob/master/falcosidekick/values.yaml for configuration values.
It seems like you are using the wrong Falco version
2024-01-15T10:20:47.7272964Z 22 Mon Jan 15 19:11:44 2024 deployed falco-1.15.2 0.29.0 Upgrade complete
Maybe you could try to delete the actual helm deployment with
helm uninstall falco
and then try again
# to update the helm chart to the latest version
helm repo update falcosecurity --fail-on-repo-update-fail
helm show chart falcosecurity/falco
The output should be something like
apiVersion: v2
appVersion: 0.36.2
dependencies:
- condition: falcosidekick.enabled
name: falcosidekick
repository: https://falcosecurity.github.io/charts
version: 0.7.11
...
and then:
# to run Falco with modern ebpf
helm install falco falcosecurity/falco \
--set driver.kind=modern-bpf
@Andreagit97 Sorry for the delay in implementation. I tried to remove Falco using the command you provided, but the output was as follows.
Is it safe to delete the daemon from the Azure portal, delete and reinsert the deployed pod?
What is the output after these 2 commands?
# to update the helm chart to the latest version
helm repo update falcosecurity --fail-on-repo-update-fail
helm show chart falcosecurity/falco
@Andreagit97
The result is as below.
By the way, the way to delete the daemon set is to delete the Falco daemon set from the portal screen below.
ok, I don't know why you have Falco deployed through Azure but I think you can remove it. At the end of the cleanup, you should check that there are no Falco instances in the cluster:
kubectl get pods -A | grep falco
This should return nothing...
If you are in this situation now you can simply deploy Falco with:
helm install falco falcosecurity/falco \
--set driver.kind=modern-bpf
@Andreagit97 Thank you for teaching. I immediately installed the latest version of Falco. The results are as follows. The problem persists despite the latest versi 20240126_開発環境_Falco最新Vserデプロイ.txt on.
I will also send you the entire log, so please check it.
P.S. The command you provided didn't work, so I used the upgrade command.
@Andreagit97 Don't you know the cause of this problem? We've been investigating this for the past two weeks, but we can't find the cause. Please let me know if you have any advice.
ei @hkinjo2 to help you we need the Falco startup logs, otherwise we cannot understand what is going on... These are the startup Falco logs:
Sep 11 08:46:08 localhost.localdomain falco[789]: Falco version: 0.37.1 (x86_64)
Sep 11 08:46:08 localhost.localdomain falco[789]: Falco initialized with configuration file: /etc/falco/falco.yaml
Sep 11 08:46:08 localhost.localdomain falco[789]: Loading rules from file /etc/falco/falco_rules.yaml
Sep 11 08:46:08 localhost.localdomain falco[789]: Loading rules from file /etc/falco/falco_rules.local.yaml
Sep 11 08:46:08 localhost.localdomain falco[789]: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Sep 11 08:46:08 localhost.localdomain falco[789]: Starting health webserver with threadiness 4, listening on port 8765
Sep 11 08:46:08 localhost.localdomain falco[789]: Loaded event sources: syscall
Sep 11 08:46:08 localhost.localdomain falco[789]: Enabled event sources: syscall
Sep 11 08:46:08 localhost.localdomain falco[789]: Opening 'syscall' source with modern BPF probe.
Sep 11 08:46:08 localhost.localdomain falco[789]: One ring buffer every '2' CPUs.
...
You can obtain them by running kubectl logs <falco_pod_name>
@Andreagit97
I talked about it last time too. As you can see, Falco is not running, so no logs can be collected. What shall we do.....
Hi @hkinjo2, it seems that one of the initContainers
is failing. Please provide the logs for those containers. To get the logs of a previous run for a given container use the --previous
flag:
kubectl logs --previous -n falco falco-7f5qx name-of-init-container-here
@alacuku
We become indebted to.
Sorry for my late reply.
I tried using the command you provided, but it still didn't work.
Please see the image.
@alacuku
Hello Can I investigate using the information you previously provided? Please tell us about your current situation. It's a no-brainer at my place of work. We apologize for any inconvenience. Thank you very much.
@Andreagit97
We upgraded the AKS cluster to a supported version yesterday, April 4, including the possibility of isolating whether the problem was caused by AKS or by Falco. (1.27.9) The upgrade was successfully completed, but Falco is still not resolved. We checked the events of the pods and found the following message. Preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod..." Please see the attached file for the actual screen of the above message. Therefore, we would appreciate it if you could provide us with Falco's perspective on possible causes and solutions. Thank you in advance.
@Andreagit97 We become indebted to. What do you think about the status of the investigation?
Due to the version upgrade of AKS equipped with Falco, There was a difference in the container used with other normally operating Falcos, so we will cooperate. Compared to an environment that is running normally, "falcoctl-artifact-follow" is added extra. Originally, it was assumed that only a container called "Falco" was attached.
===抜粋======== NameSpace NAME CONTAINERS falco falco-4t57h falco,falcoctl-artifact-follow falco falco-f4h4s falco,falcoctl-artifact-follow falco falco-xd9cz falco,falcoctl-artifact-follow falco falco-z2s8l falco,falcoctl-artifact-follow
hi folks as we told you we can help you if Falco is not working but we need logs. Please clean all your Falco instances and try this
helm repo update falcosecurity
Then
helm show chart falcosecurity/falco
the output should be something like
apiVersion: v2
appVersion: 0.37.1
dependencies:
...
Then install Falco
helm install falco \
--set driver.kind=modern_ebpf \
--set falcoctl.artifact.install.enabled=false \
--set falcoctl.artifact.follow.enabled=false \
falcosecurity/falco
if Falco doesn't work and is in CrashLoopBackOff
you should provide us with the logs
kubectl logs <your-falco-instance-name>
@Andreagit97 hello Thank you for teaching me. However, the result was not what I expected. I will send it along with the execution log, so please check it.
Thank you for continuing to be with me. thank you. falco.zip
Uhm it seems you have an already running Falco instance, you can try to delete it with
helm uninstall falco -n falco
and then retry
helm install falco \
--set driver.kind=modern_ebpf \
--set falcoctl.artifact.install.enabled=false \
--set falcoctl.artifact.follow.enabled=false \
falcosecurity/falco
@Andreagit97 Hello,
Thank you for your guidance. We recently conducted a reinstall of Falco. Out of the four redeployed pods, three have successfully started up. However, one remains in a pending state without starting up. Attached is the event log for your review to assess if there are any functional implications. Additionally, if there are any impacts, could you please advise on the appropriate course of action? Thank you. falco-rg6n9.17d171d27601c9ec_Event.yaml.zip
Ei @akirtanabe it seems you reached some pod limits on the node.
reason: FailedScheduling
message: >-
0/4 nodes are available: 1 Too many pods. preemption: 0/4 nodes are available:
4 No preemption victims found for incoming pod..
This is not an issue related to Falco, your nodes are probably reaching their pod limits. I cannot help you here but probably you can search this issue online https://learn.microsoft.com/en-us/answers/questions/761871/unable-to-schedule-pods-on-nodes-it-says-too-many
@Andreagit97 Thank you for your help.
I understand that this may have impacted the AKS environment. The pending pods recovered over time.
So I think reinstalling Falco resolved the issue.
Thank you for your support so far. I'm going to close this issue now.
Thank you.
great, I'm happy to hear that! I will close the issue
@Andreagit97 Hello.
The event you requested also occurred in a different environment. I reinstalled Falco but this also failed.
Could you please take a look at the log for details? I want to solve it in a hurry.
AKSクラスターのバージョンを1.26.1に上げたところFalcoがCrashLoopBackOffとなり正常に稼働しなくなりました。 再起動・再デプロイ・新しいバージョンのFlacoのデプロイ・割り当てcpuの変更など行いましたが事象は解決されませんでした。 こちらの事象についてご教示いただけると幸いです。 ※現在のFalcoのバージョンは1.29.1です。