AltraMayor / gatekeeper

The first open-source DDoS protection system
https://github.com/AltraMayor/gatekeeper/wiki
GNU General Public License v3.0
1.34k stars 232 forks source link

1.2.0-dev executes setup.sh, and the following error occurs when compiling bpf. #679

Closed ShawnLeung87 closed 5 months ago

ShawnLeung87 commented 8 months ago

In file included from granted.c:29: In file included from /usr/local/include/rte_mbuf_core.h:22: /usr/local/include/rte_byteorder.h:30:16: error: invalid output constraint '=Q' in asm : [x1] "=Q" (x) ^ 1 error generated. make: *** [Makefile:17: granted.bpf] Error 1 clang -O2 -g -target bpf -I../include -Wall -Wextra -Wno-int-to-void-pointer-cast -o granted.bpf -c granted.c In file included from granted.c:29: In file included from /usr/local/include/rte_mbuf_core.h:22: /usr/local/include/rte_byteorder.h:30:16: error: invalid output constraint '=Q' in asm : [x1] "=Q" (x) ^ 1 error generated.

ubuntu 20.04 Kernel uses 5.13.16-051316-generic

ShawnLeung87 commented 8 months ago

672 Has this bug not been fixed yet?

diff --git a/bpf/Makefile b/bpf/Makefile index d98f52b..c426214 100644 --- a/bpf/Makefile +++ b/bpf/Makefile @@ -14,7 +14,7 @@ copy: all $(INSTALL) -m660 $(TARGETS) $(DESTDIR)

%.bpf: %.c

I modified the bpf makefile, but I still get the same error when recompiling.

AltraMayor commented 8 months ago

I'm in the middle of upgrading Gatekeeper v1.2 to DPDK v23.11. Once I finish this upgrade, the workaround you tried will work.

ShawnLeung87 commented 8 months ago

/path/dpdk/include/rte_byteorder.h I moved the definition judgment code of this macro to the front of this ("static inline uint16_t rte_arch_bswap16(uint16_t _x)"), and it can currently be compiled and passed.

      #ifndef RTE_FORCE_INTRINSICS
      static inline uint16_t rte_arch_bswap16(uint16_t _x)
      {
          uint16_t x = _x;
          asm volatile ("xchgb %b[x1],%h[x2]"
                    : [x1] "=Q" (x)
                    : [x2] "0" (x)
                    );
          return x;
      }
ShawnLeung87 commented 8 months ago

I have a problem now. The output packets sent through gatekeeper cannot pass. I have not added any interception policy and all are allowed. Is it because you changed the KIN code? I can capture the data and see that the input data packet can arrive. output return packets are dropped.

Use granted.bpf ERR Flow (src: 20.20.20.20, dst: 30.30.30.30) at index 0: [state: GK_BPF (3), flow_hash_value: 0x5238060, expire_at: 0xd4d88230d4fa, program_index=1, cookie=0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, grantor_ip: 10.0.2.11]

The input data packet passes through the gatekeeper, and the output data packet does not go through the gatekeeper and can reach the Internet. The bpf policy also filters normally. Version 1.2dev is required to solve the output packet problem.

AltraMayor commented 8 months ago

Hi @ShawnLeung87,

program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

ShawnLeung87 commented 8 months ago

Hi @ShawnLeung87,

program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

I've checked everything you mentioned. There is no bpf index problem. Now my successful test scenario is that the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally. The failed test scenario is that both input and output packets pass through the gatekeeper. The server is unreachable.

ShawnLeung87 commented 8 months ago
image

The display status of the network card is different from that of version 1.1.0, and it is displayed as unknown. Normally it should show up

ShawnLeung87 commented 8 months ago

The failed test scenario is that both input and output packets pass through the gatekeeper. Error log that appears

        Main/0 2024-03-21 01:27:19 NOTICE ipv4_flow_add(back, DstIP=10.0.2.1 UDP SrcPort=41120/0xffff DstPort=45232/0xffff): cannot validate IPv4 flow, errno=22 (Invalid argument), rte_flow_error_type=16: Not supported action.
        Main/0 2024-03-21 01:27:19 NOTICE Cannot register IPv4 flow on the back interface; falling back to software filters

        GK/6 2024-03-21 02:46:51 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below:
        dump mbuf at 0xcc3c60700, iova=0xcc3c60798, buf_len=2176
          pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10
          segment at 0xcc3c60700, data=0xcc3c60818, len=98, off=128, refcnt=1
          Dump data at [0xcc3c60818], len=98

Successful test scenario(the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally) without this error log.

Preliminary suspicion is that the ACL policy of the back network card may have rejected the data packet. rte_flow_action error when registering the back network card. It should be that dpdk has changed and the rte_flow_action type has been changed. The dpdk version I am currently using is 21.05

AltraMayor commented 8 months ago

I need to finish my ongoing DPDK port before looking at this issue. I'll report back later.

AltraMayor commented 7 months ago

image The display status of the network card is different from that of version 1.1.0, and it is displayed as unknown. Normally it should show up

Gatekeeper v1.2 uses a new kernel module to implement the KNI interfaces, and this unknown state is consistent with other kernel modules that implement virtual interfaces. For example, my work VPN interface shows the same unknown state. We may find small differences, but they should not interfere with the general working of Gatekeeper.

AltraMayor commented 7 months ago

Hi @ShawnLeung87, program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

I've checked everything you mentioned. There is no bpf index problem. Now my successful test scenario is that the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally. The failed test scenario is that both input and output packets pass through the gatekeeper. The server is unreachable.

If I inferred it correctly, you mean that the packets going from the protected server to the Gatekeeper server is not being forwarded from the back network to the front network. Have you checked if there's a prefix to forward the packet in the routing table of the GK blocks?

AltraMayor commented 7 months ago

The failed test scenario is that both input and output packets pass through the gatekeeper. Error log that appears

        Main/0 2024-03-21 01:27:19 NOTICE ipv4_flow_add(back, DstIP=10.0.2.1 UDP SrcPort=41120/0xffff DstPort=45232/0xffff): cannot validate IPv4 flow, errno=22 (Invalid argument), rte_flow_error_type=16: Not supported action.
        Main/0 2024-03-21 01:27:19 NOTICE Cannot register IPv4 flow on the back interface; falling back to software filters

        GK/6 2024-03-21 02:46:51 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below:
        dump mbuf at 0xcc3c60700, iova=0xcc3c60798, buf_len=2176
          pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10
          segment at 0xcc3c60700, data=0xcc3c60818, len=98, off=128, refcnt=1
          Dump data at [0xcc3c60818], len=98

Successful test scenario(the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally) without this error log.

Preliminary suspicion is that the ACL policy of the back network card may have rejected the data packet. rte_flow_action error when registering the back network card. It should be that dpdk has changed and the rte_flow_action type has been changed. The dpdk version I am currently using is 21.05

Weren't there log lines that showed the dropped packet in hexadecimal? That information would've allowed us to see which packet was dropped.

AltraMayor commented 7 months ago

Branch v1.2.0-dev is now running with DPDK 23.11. So once you update your local Gatekeeper repository, make sure that you update your local copy of DPDK and compile it again.

ShawnLeung87 commented 7 months ago

1、In the dpdk23.11 version of gatekeeper, there are still protected servers that cannot access the Internet through the gatekeeper's back network card.

The log of blocked data packets is as follows: GK/5 2024-03-26 14:53:02 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below: dump mbuf at 0x11ffad4280, iova=0x11ffad4318, buf_len=2176 pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10 segment at 0x11ffad4280, data=0x11ffad4398, len=98, off=128, refcnt=1 Dump data at [0x11ffad4398], len=98 00000000: 90 E2 BA 8E C0 75 64 00 F1 6B 0A 01 08 00 45 00 | .....ud..k....E. 00000010: 00 54 0A 7E 40 00 3F 01 CC C7 1E 1E 1E 1E 14 14 | .T.~@.?......... 00000020: 14 14 08 00 38 97 18 2B 00 06 48 00 02 66 00 00 | ....8..+..H..f.. 00000030: 00 00 94 FE 09 00 00 00 00 00 10 11 12 13 14 15 | ................ 00000040: 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 | .......... !"#$% 00000050: 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 | &'()*+,-./012345 00000060: 36 37 | 67

BPF data for flow: BPF_INDEX_GRANTED = 0

GK/7 2024-03-26 14:57:57 ERR Flow (src: 20.20.20.20, dst: 30.30.30.30) at index 0: [state: GK_REQUEST (0), flow_hash_value: 0xc68c807, expire_at: 0x2f4ed04f11f0, last_packet_seen_at: 0x2f497e859842, last_priority: 39, allowance: 7, grantor_ip: 10.0.2.11]

2、This version cannot set flow_ht_size to 250000000.The dpdk 21.05 version of gatekeeper does not have this problem. It should be that the memory allocation of dpdk has not been modified.The error is reported as follows: EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list Main/0 2024-03-26 13:56:40 ERR setup_gk_instance(lcore=4): rte_calloc_socket() failed to allocate flow entry table Main/0 2024-03-26 13:56:40 ERR Failed to setup gk instances for GK block at lcore 4

3、forwarding route prefix, FIB entry for IP prefix: 0.0.0.0/0 with action FWD_GATEWAY_FRONT_NET (1) Default route prefix missing.I have released the default route to gatekeeper. But query fib. No default route seen gkctl show_fib.lua FIB entry for IP prefix: 30.30.30.0/24 with action FWD_GRANTOR (0) Grantor IP address: 10.0.2.11 Ethernet cache entry: [state: fresh, nexthop ip: 10.0.2.3, d_addr: 64:00:F1:6B:0A:01]

ShawnLeung87 commented 7 months ago

This problem (there are still protected servers that cannot access the Internet through the gatekeeper's back network card.) has been solved by adding fib default route.

ShawnLeung87 commented 7 months ago

This version cannot set flow_ht_size to 250000000.The dpdk 21.05 version of gatekeeper does not have this problem. It should be that the memory allocation of dpdk has not been modified.The error is reported as follows: EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list Main/0 2024-03-26 13:56:40 ERR setup_gk_instance(lcore=4): rte_calloc_socket() failed to allocate flow entry table Main/0 2024-03-26 13:56:40 ERR Failed to setup gk instances for GK block at lcore 4 The problem of not being able to add 250 million flows has not been solved yet gatekeeper gk.lua is configured with 4 cores by default

AltraMayor commented 7 months ago

I recommend setting flow_ht_size to 200000000 while I work on the memory issue, so you can continue with your tests.

ShawnLeung87 commented 7 months ago

After modifying these constants, you can set flow_ht_size to 200 million. Before modification, even if flow_ht_size is set to 200 million, the memory will not be successfully initialized. rte_config.h

           #define RTE_MAX_MEMSEG_PER_LIST (8192 << 1)
           #define RTE_MAX_MEM_MB_PER_LIST (32768 << 1)
           #define RTE_MAX_MEMSEG_PER_TYPE (32768 << 1)
           #define RTE_MAX_MEM_MB_PER_TYPE (65536 << 1)
ShawnLeung87 commented 7 months ago

Can this version integrate the BPF parameter extension patch #643? Currently, this patch has been added to the gatekeeper 1.2dev version of my test environment.

AltraMayor commented 7 months ago

Extending the cookie of the BPFs is outside our short-term roadmap. I recommend you make a case for why the BPF cookie should be extended. I'm not aware of which kinds of attacks are blocked or which features are enabled by extended BPF cookies. If you can come up with a convincing case to increase the memory requirement, other deployers will want the feature, and proper implementation will eventually happen. The patch in issue #643 was meant to allow you to experiment with an extended cookie.

ShawnLeung87 commented 7 months ago

Extending the cookie of the BPFs is outside our short-term roadmap. I recommend you make a case for why the BPF cookie should be extended. I'm not aware of which kinds of attacks are blocked or which features are enabled by extended BPF cookies. If you can come up with a convincing case to increase the memory requirement, other deployers will want the feature, and proper implementation will eventually happen. The patch in issue #643 was meant to allow you to experiment with an extended cookie.

Now we have written our own BPF for tcp reflection defense, which has better defense effect. If the default number of bpf cookies parameters is used, such BPF cannot be compiled. According to the default cookie parameters, the bpf written can only do simple restriction of data packets. It is impossible to make a targeted judgment on the number of tcp flags.

After verification in our production environment, special DDoS attack types do require too many bpf cookie parameters to execute defense business logic functions. The default number of parameters is no longer suitable for many types of DDOS attacks. We have increased the number of bpf. Compared with your default parameter bpf, the defense success rate is as follows: In the same tcp reflection scenario Ours can reach 97%-99% The default number of parameters for bpf, 40%-50%

AltraMayor commented 7 months ago

Please create an issue specifically to discuss the extension of the cookie of the BPFs. In addition to copying the text you already have above, add a fully functional BPF showing off your solution. Add comments to the code where appropriate. If we move forward with the cookie extension, we'll need to have a BPF in the folder bpf/ to showcase the extension. Moreover, explain the parameters of your BPF, list the values you pass to those parameters, and provide detailed examples of the attacks that motivate the BPF. The increased memory requirement is not small, so we need support from other deployers.

I've pushed a commit to branch v1.2.0-dev that increases DPDK's memory allocation. This seems to be the last problem in this issue. If so, we can close this issue.