daeuniverse / dae

eBPF-based Linux high-performance transparent proxy solution.
GNU Affero General Public License v3.0
2.63k stars 165 forks source link

level=fatal msg="load eBPF objects: field TproxyWanEgress: program tproxy_wan_egress: load program: invalid argument: ca> #506

Closed hiifeng closed 2 months ago

hiifeng commented 2 months ago

Checks

Support Request

在玩客云上运行dae失败,请求支持。

Current Behavior

我为玩客云编译了armbian内核,以支持运行dae。

root@onecloud:~# uname -r

6.7.12-edge-meson

root@onecloud:~# (zcat /proc/config.gz || cat /boot/{config,config-$(uname -r)}) | grep -E 'CONFIG_(DEBUG_INFO|DEBUG_INFO_BTF|KPROBES|KPROBE_EVENTS|BPF|BPF_SYSCALL|BPF_JIT|BPF_STREAM_PARSER|NET_CLS_ACT|NET_SCH_INGRESS|NET_INGRESS|NET_EGRESS|NET_CLS_BPF|BPF_EVENTS|CGROUPS)=|# CONFIG_DEBUG_INFO_REDUCED is not set'

CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y CONFIG_CGROUPS=y CONFIG_KPROBES=y CONFIG_NET_INGRESS=y CONFIG_NET_EGRESS=y CONFIG_NET_SCH_INGRESS=m CONFIG_NET_CLS_BPF=m CONFIG_NET_CLS_ACT=y CONFIG_BPF_STREAM_PARSER=y CONFIG_DEBUG_INFO=y

CONFIG_DEBUG_INFO_REDUCED is not set

CONFIG_DEBUG_INFO_BTF=y CONFIG_KPROBE_EVENTS=y CONFIG_BPF_EVENTS=y root@onecloud:~#

目前的内核版本为6.7.12-edge-meson,同时Kernel Configurations满足dae的要求。

尝试启动dae,报如下错误。

root@onecloud:~# systemctl start dae Job for dae.service failed because the control process exited with error code. See "systemctl status dae.service" and "journalctl -xeu dae.service" for details.

root@onecloud:~# systemctl status dae.service × dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2024-04-24 19:38:42 CST; 20s ago Docs: https://github.com/daeuniverse/dae Process: 1458 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Process: 1464 ExecStart=/usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae (code=exited, status=1/FAILURE) Main PID: 1464 (code=exited, status=1/FAILURE) CPU: 38.882s

Apr 24 19:38:03 onecloud dae[1464]: level=info msg="Include config files: [/etc/dae/config.dae]" Apr 24 19:38:03 onecloud dae[1464]: level=warning msg="No node found." Apr 24 19:38:03 onecloud dae[1464]: level=warning msg="No interface to bind." Apr 24 19:38:03 onecloud dae[1464]: level=info msg="Loading eBPF programs and maps into the kernel..." Apr 24 19:38:03 onecloud dae[1464]: level=info msg="The loading process takes about 120MB free memory, which will be released after loading. Insufficient me> Apr 24 19:38:42 onecloud dae[1464]: level=fatal msg="load eBPF objects: field TproxyWanEgress: program tproxy_wan_egress: load program: invalid argument: ca> Apr 24 19:38:42 onecloud systemd[1]: dae.service: Main process exited, code=exited, status=1/FAILURE Apr 24 19:38:42 onecloud systemd[1]: dae.service: Failed with result 'exit-code'. Apr 24 19:38:42 onecloud systemd[1]: Failed to start dae Service. Apr 24 19:38:42 onecloud systemd[1]: dae.service: Consumed 38.882s CPU time.

Expected Behavior

No response

Steps to Reproduce

root@onecloud:~# cat /etc/dae/config.dae global{} routing{}

root@onecloud:~# free -h total used free shared buff/cache available Mem: 980Mi 100Mi 748Mi 4.0Mi 131Mi 852Mi Swap: 490Mi 0B 490Mi root@onecloud:~#

Environment

root@onecloud:~# dae --version dae version v0.6.0rc2 go runtime go1.22.2 linux/arm Copyright (c) 2022-2024 @daeuniverse License GNU AGPLv3 https://github.com/daeuniverse/dae/blob/main/LICENSE

root@onecloud:~# cat /etc/os-release PRETTY_NAME="Armbian-unofficial 24.5.0-trunk jammy" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.armbian.com" SUPPORT_URL="https://forum.armbian.com" BUG_REPORT_URL="https://www.armbian.com/bugs" PRIVACY_POLICY_URL="https://www.armbian.com" UBUNTU_CODENAME=jammy ARMBIAN_PRETTY_NAME="Armbian-unofficial 24.5.0-trunk jammy"

root@onecloud:~# uname -a Linux onecloud 6.7.12-edge-meson #1 SMP Wed Apr 3 13:11:59 UTC 2024 armv7l armv7l armv7l GNU/Linux

Anything else?

No response

dae-prow[bot] commented 2 months ago

Thanks for opening this issue!

mzz2017 commented 2 months ago

log level 开 fatal 看看

hiifeng commented 2 months ago

log level 开 fatal 看看

首先感谢您的回复,帮我分析故障原因。谢谢 根据你的提示,我修改了日志级别。

root@onecloud:~# cat /etc/dae/config.dae global{ log_level: fatal } routing{}

再次尝试启动dae

root@onecloud:/etc/dae# systemctl start dae Job for dae.service failed because the control process exited with error code. See "systemctl status dae.service" and "journalctl -xeu dae.service" for details .

root@onecloud:/etc/dae# systemctl status dae.service × dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2024-04-25 08:34:07 CST; 54s ago Docs: https://github.com/daeuniverse/dae Process: 1456 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Process: 1462 ExecStart=/usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae (code=exited, status=1/FAILURE) Main PID: 1462 (code=exited, status=1/FAILURE) CPU: 39.131s

Apr 25 08:33:28 onecloud systemd[1]: Starting dae Service... Apr 25 08:34:07 onecloud dae[1462]: level=fatal msg="callbacks are not allowed in non-JITed programs Apr 25 08:34:07 onecloud dae[1462]: processed 229682 insns (limit 1000000) max_states_per_insn 68 total_states 10138 peak_states 1360 mark_read 86" Apr 25 08:34:07 onecloud systemd[1]: dae.service: Main process exited, code=exited, status=1/FAILURE Apr 25 08:34:07 onecloud systemd[1]: dae.service: Failed with result 'exit-code'. Apr 25 08:34:07 onecloud systemd[1]: Failed to start dae Service. Apr 25 08:34:07 onecloud systemd[1]: dae.service: Consumed 39.131s CPU time.

root@onecloud:~# tail -f /var/log/syslog Apr 25 08:33:28 onecloud systemd[1]: Starting dae Service... Apr 25 08:33:28 onecloud systemd-udevd[1470]: Using default interface naming scheme 'v249'. Apr 25 08:33:28 onecloud systemd-udevd[1469]: Using default interface naming scheme 'v249'. Apr 25 08:33:28 onecloud systemd[1]: run-netns-daens.mount: Deactivated successfully. Apr 25 08:33:56 onecloud chronyd[1311]: Selected source **** (0.ubuntu.pool.ntp.org) Apr 25 08:34:07 onecloud dae[1462]: level=fatal msg="callbacks are not allowed in non-JITed programs Apr 25 08:34:07 onecloud dae[1462]: processed 229682 insns (limit 1000000) max_states_per_insn 68 total_states 10138 peak_states 1360 mark_read 86" Apr 25 08:34:07 onecloud systemd[1]: dae.service: Main process exited, code=exited, status=1/FAILURE Apr 25 08:34:07 onecloud systemd[1]: dae.service: Failed with result 'exit-code'. Apr 25 08:34:07 onecloud systemd[1]: Failed to start dae Service. Apr 25 08:34:07 onecloud systemd[1]: dae.service: Consumed 39.131s CPU time. Apr 25 08:35:01 onecloud CRON[1519]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 25 08:35:29 onecloud systemd[1]: Starting Daily man-db regeneration... Apr 25 08:35:51 onecloud systemd[1]: man-db.service: Deactivated successfully. Apr 25 08:35:51 onecloud systemd[1]: Finished Daily man-db regeneration. Apr 25 08:35:51 onecloud systemd[1]: man-db.service: Consumed 1.093s CPU time. Apr 25 08:40:29 onecloud systemd[1]: Starting system activity accounting tool... Apr 25 08:40:29 onecloud systemd[1]: sysstat-collect.service: Deactivated successfully. Apr 25 08:40:29 onecloud systemd[1]: Finished system activity accounting tool.

我发现dae在启动时链接了ntp服务器“Selected source **** (0.ubuntu.pool.ntp.org)”,但是我的本地时间是正确的,猜测可能是dae支持vmess,进行自动校时。 另外有一行这样的提示“level=fatal msg="callbacks are not allowed in non-JITed programs”,我猜测应该是这个原因引起的,但是我不知道non-JITed是什么,请求支持,谢谢

jschwinger233 commented 2 months ago

@hiifeng 试试 echo 1 > /proc/sys/net/core/bpf_jit_enable ?

hiifeng commented 2 months ago

@hiifeng 试试 echo 1 > /proc/sys/net/core/bpf_jit_enable ?

root@onecloud:~# echo 1 > /proc/sys/net/core/bpf_jit_enable

root@onecloud:~# cat /proc/sys/net/core/bpf_jit_enable 1

root@onecloud:~# systemctl start dae Job for dae.service failed because the control process exited with error code. See "systemctl status dae.service" and "journalctl -xeu dae.service" for details .

root@onecloud:~# systemctl status dae.service × dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2024-04-25 09:47:48 CST; 1min 27s ago Docs: https://github.com/daeuniverse/dae Process: 7282 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Process: 7288 ExecStart=/usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae (code=exited, status=1/FAILURE) Main PID: 7288 (code=exited, status=1/FAILURE) CPU: 39.614s

Apr 25 09:47:08 onecloud systemd[1]: Starting dae Service... Apr 25 09:47:48 onecloud dae[7288]: level=fatal msg="JIT doesn't support bpf-to-bpf calls Apr 25 09:47:48 onecloud dae[7288]: callbacks are not allowed in non-JITed programs Apr 25 09:47:48 onecloud dae[7288]: processed 229682 insns (limit 1000000) max_states_per_insn 68 total_states 10138 peak_states 1360 mark_read 86" Apr 25 09:47:48 onecloud systemd[1]: dae.service: Main process exited, code=exited, status=1/FAILURE Apr 25 09:47:48 onecloud systemd[1]: dae.service: Failed with result 'exit-code'. Apr 25 09:47:48 onecloud systemd[1]: Failed to start dae Service. Apr 25 09:47:48 onecloud systemd[1]: dae.service: Consumed 39.614s CPU time. root@onecloud:~#

root@onecloud:~# tail -f /var/log/syslog Apr 25 09:47:08 onecloud systemd[1]: Starting dae Service... Apr 25 09:47:09 onecloud systemd-udevd[7295]: Using default interface naming scheme 'v249'. Apr 25 09:47:09 onecloud systemd[1]: run-netns-daens.mount: Deactivated successfully. Apr 25 09:47:09 onecloud systemd-udevd[7296]: Using default interface naming scheme 'v249'. Apr 25 09:47:48 onecloud dae[7288]: level=fatal msg="JIT doesn't support bpf-to-bpf calls Apr 25 09:47:48 onecloud dae[7288]: callbacks are not allowed in non-JITed programs Apr 25 09:47:48 onecloud dae[7288]: processed 229682 insns (limit 1000000) max_states_per_insn 68 total_states 10138 peak_states 1360 mark_read 86" Apr 25 09:47:48 onecloud systemd[1]: dae.service: Main process exited, code=exited, status=1/FAILURE Apr 25 09:47:48 onecloud systemd[1]: dae.service: Failed with result 'exit-code'. Apr 25 09:47:48 onecloud systemd[1]: Failed to start dae Service. Apr 25 09:47:48 onecloud systemd[1]: dae.service: Consumed 39.614s CPU time.

这次提示了“JIT doesn't support bpf-to-bpf calls”。这一项也是y:CONFIG_BPF_JIT=y

jschwinger233 commented 2 months ago

arm bpf 的问题。。。 bpf 内核测试 CI 看来要加上 arm 虚拟机

mzz2017 commented 2 months ago

你或许可以使用0.5.1正式版,或者0.6.0rc1

hiifeng commented 2 months ago

你或许可以使用0.5.1正式版,或者0.6.0rc1

使用0.5.1正式版启动起来了。 root@onecloud:~/dae# systemctl status dae ● dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-04-25 10:06:55 CST; 16s ago Docs: https://github.com/daeuniverse/dae Process: 7408 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Main PID: 7413 (dae) Tasks: 9 (limit: 2164) Memory: 30.9M CPU: 28.605s CGroup: /system.slice/dae.service └─7413 /usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae

Apr 25 10:06:27 onecloud systemd[1]: Starting dae Service... Apr 25 10:06:55 onecloud systemd[1]: Started dae Service.

非常感谢作者的回复。接下来我会尝试给N1适配dae可以使用的内核,提交到https://github.com/daeuniverse/armbian-btf-kernel/releases,方便更多人的使用。

mzz2017 commented 2 months ago

@hiifeng 感谢,你愿意维护这个仓库吗,这个仓库已经失修了

mzz2017 commented 2 months ago

@jschwinger233 这个问题目前有解吗

hiifeng commented 2 months ago

@hiifeng 感谢,你愿意维护这个仓库吗,这个仓库已经失修了

我很乐意维护,不过我水平有限,不懂的今后还要请教您。

jschwinger233 commented 2 months ago

@jschwinger233 这个问题目前有解吗

不确定,可能要改内核才能彻底支持,但也有可能通过修改我们的bpf程序来绕过。思路是不用(或简化)bpf2bpf,改用兼容性更好的tailcall。

mzz2017 commented 2 months ago

@hiifeng 没问题,保持沟通

mzz2017 commented 2 months ago

@jschwinger233 bpf2bpf是什么,我怀疑是 bpf_timer 的回调函数的问题?

jschwinger233 commented 2 months ago

@mzz2017 bpf2bpf 就是下面这个函数:

https://github.com/daeuniverse/dae/blob/a75a2fffd73a18ac4f63857c4cbb08dcee99aa43/control/kern/tproxy.c#L641-L643

那个特别的标注 noinline 禁止内联,bpf function calls bpf function 就是 bpf2bpf,听起来很基础但是 verifier 到很晚才支持。我们只有 route 函数是 bpf2bpf。

对这个 issue 我的理解是:

  1. net.core.bpf_jit_enable 在 arm 上可能默认是 0 关闭,这在之前无所谓
  2. 新加了功能 bpf_timer callback 要求必须 jit,所以必须打开
  3. 但是 arm 上 jit 和 bpf2bpf 可能又有 bug 不能一起工作,所以开不了
  4. 所以我说不用 bpf2bpf 可能就可以解决问题
mzz2017 commented 2 months ago

@jschwinger233 原来如此,学习了

zhb236623zhb commented 3 weeks ago

你或许可以使用0.5.1正式版,或者0.6.0rc1

使用0.5.1正式版启动起来了。 root@onecloud:~/dae# systemctl status dae ● dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-04-25 10:06:55 CST; 16s ago Docs: https://github.com/daeuniverse/dae Process: 7408 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Main PID: 7413 (dae) Tasks: 9 (limit: 2164) Memory: 30.9M CPU: 28.605s CGroup: /system.slice/dae.service └─7413 /usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae

Apr 25 10:06:27 onecloud systemd[1]: Starting dae Service... Apr 25 10:06:55 onecloud systemd[1]: Started dae Service.

非常感谢作者的回复。接下来我会尝试给N1适配dae可以使用的内核,提交到https://github.com/daeuniverse/armbian-btf-kernel/releases,方便更多人的使用。 我的玩客云用了0.5.1 还是启动不起来。能否出一期教程,从使用哪个ARMBIAN版本开始。

hiifeng commented 3 weeks ago

你或许可以使用0.5.1正式版,或者0.6.0rc1

使用0.5.1正式版启动起来了。 root@onecloud:~/dae# systemctl status dae ● dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-04-25 10:06:55 CST; 16s ago Docs: https://github.com/daeuniverse/dae Process: 7408 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Main PID: 7413 (dae) Tasks: 9 (limit: 2164) Memory: 30.9M CPU: 28.605s CGroup: /system.slice/dae.service └─7413 /usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae Apr 25 10:06:27 onecloud systemd[1]: Starting dae Service... Apr 25 10:06:55 onecloud systemd[1]: Started dae Service. 非常感谢作者的回复。接下来我会尝试给N1适配dae可以使用的内核,提交到https://github.com/daeuniverse/armbian-btf-kernel/releases,方便更多人的使用。 我的玩客云用了0.5.1 还是启动不起来。能否出一期教程,从使用哪个ARMBIAN版本开始。

我最近工作上有点忙,在外地出差。我抽空更新一下博客。

zhb236623zhb commented 3 weeks ago

你或许可以使用0.5.1正式版,或者0.6.0rc1

使用0.5.1正式版启动起来了。 root@onecloud:~/dae# systemctl status dae ● dae.service - dae Service Loaded: loaded (/etc/systemd/system/dae.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-04-25 10:06:55 CST; 16s ago Docs: https://github.com/daeuniverse/dae Process: 7408 ExecStartPre=/usr/bin/dae validate -c /etc/dae/config.dae (code=exited, status=0/SUCCESS) Main PID: 7413 (dae) Tasks: 9 (limit: 2164) Memory: 30.9M CPU: 28.605s CGroup: /system.slice/dae.service └─7413 /usr/bin/dae run --disable-timestamp -c /etc/dae/config.dae Apr 25 10:06:27 onecloud systemd[1]: Starting dae Service... Apr 25 10:06:55 onecloud systemd[1]: Started dae Service. 非常感谢作者的回复。接下来我会尝试给N1适配dae可以使用的内核,提交到https://github.com/daeuniverse/armbian-btf-kernel/releases,方便更多人的使用。 我的玩客云用了0.5.1 还是启动不起来。能否出一期教程,从使用哪个ARMBIAN版本开始。

我最近工作上有点忙,在外地出差。我抽空更新一下博客。

试了好多个版本的ARMBIAN 也试了 0.5.1 ,v0.6.0rc1,v0.6.0rc2 都 是一样启动不起来。期待您的更新。