florianl / go-tc

traffic control in pure go - it allows to read and alter queues, filters and classes
MIT License
451 stars 48 forks source link

tc_u32_sel struct alignment #14

Closed twaapo closed 4 years ago

twaapo commented 4 years ago

Hi! I think the kernel struct below has a 12byte aligment for whatever reason. Do you know?

pkt_cls.h:

struct tc_u32_sel {
    unsigned char       flags;
    unsigned char       offshift;
    unsigned char       nkeys;
    __be16          offmask;
    __u16           off;
    short           offoff;
    short           hoff;
    /* <-> above is 12bytes */
    __be32          hmask;
    struct tc_u32_key   keys[0];
};
filter code:
    sel := &tc.U32Sel{
        Flags:    TC_U32_TERMINAL,
        Offshift: 0xde,
        NKeys:    1,
        OffMask:  0x1122,
        Off:      0x3344,
        Offoff:   0xccdd,
        Hoff:     0xeeff,
        Hmask:    bits.ReverseBytes32(0xaabbccdd),
        Keys: []tc.U32Key{
            {Mask: bits.ReverseBytes32(0x11223344), Val: bits.ReverseBytes32(0xfacefeed), Off: 0xcc, OffMask: 0x0},
        },
    }

Current serialization causes funky filters:

greybox test (i3*) # tc -s -d -r f show dev eth1
filter parent 1: protocol ip pref 5000 u32 chain 0
filter parent 1: protocol ip pref 5000 u32 chain 0 fh 800:[80000000]  ht divisor 1
filter parent 1: protocol ip pref 5000 u32 chain 0 fh 800::c[8000000c]  order 12 key ht 800 bkt 0 flowid 1:1 skip_hw not_in_hw  (rule hit 0 success 0)
  match cefeedcc/223344fa at 0 (success 0 )
    hash mask bbccdd11 at -21778

"Off"-byte 0xcc seen on match value first "Val"-byte 0xfa seen on mask, etc.

Side note: On a little endian system the explicit be16 be32, require flipping in Go-code because everything is serialized as native aka. little.

florianl commented 4 years ago

Hi! And thanks for reporting the issue! I've fixed the endianess problem with the latest commit.

Where is the information from about the 12 byte alignment? The source in the kernel does not mention something like this. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/pkt_cls.h?h=v5.7.11#n241

Flo

twaapo commented 4 years ago

C normally aligns memory for faster access. My current theory is netlink and tc never serialize anything, only copy raw structs, hence the alignment in NL traffic in the middle of struct members.

Below demonstrates the struct alignment: pad.c

#include <stdio.h>
#include <stdint.h>
#include <string.h>

struct tc_u32_key {
    uint32_t ma;
    uint32_t va;
};

// this struct is 4byte aligned (actually probably mod8 because struct tc_u32_key at 8 bytes)
struct tc_u32_sel {
    unsigned char       flags;
    //unsigned char       offshift;
    //unsigned char       nkeys;
    char pad[3]; // this member will be added by compiler if omitted
    uint16_t offmask;
    uint16_t off;
    short           offoff;
    short           hoff;
    /* <-> above is 12bytes */
    uint32_t hmask;
    struct tc_u32_key   keys[0];
};

// this struct is 2byte aligned
// because of 16bit members
struct tc_u32_sel_mod2 {
    unsigned char       flags;
    //unsigned char       offshift;
    //unsigned char       nkeys;
    char pad[1]; // this member will be added by compiler if omitted
    uint16_t offmask;
    uint16_t off;
    short           offoff;
    short           hoff;
    /* <-> above is 12bytes */
    //uint32_t hmask;
    //struct tc_u32_key   keys[0];
};

// this struct is 1byte aligned
struct tc_u32_sel_mod1 {
    unsigned char       flags;
};
int main() 
{
    struct tc_u32_sel a;
    struct tc_u32_sel_mod2 b;
    struct tc_u32_sel_mod1 c;

    printf("size of tc_u32_sel in bytes : %d\n", 
            sizeof(a));
    printf ( "\n   Address of flags\t= %u", &a.flags);
    //printf ( "\n   Address of offshift\t= %u", &a.offshift);
    //printf ( "\n   Address of nkeys\t= %u", &a.nkeys);
    printf ( "\n   Address of offmask\t= %u", &a.offmask );
    printf ( "\n   Address of off\t= %u", &a.off);
    printf ( "\n   Address of offoff\t= %u", &a.offoff);
    printf ( "\n   Address of hoff\t= %u", &a.hoff);
    printf ( "\n   Address of hmask\t= %u", &a.hmask);
    printf ( "\n   Address of keys\t= %u", &a.keys);
    printf ("\n");
    printf("size of tc_u32_sel_mod2 in bytes : %d\n", 
            sizeof(b));
    printf ( "\n   Address of flags\t= %u", &b.flags);
    //printf ( "\n   Address of offshift\t= %u", &b.offshift);
    //printf ( "\n   Address of nkeys\t= %u", &b.nkeys);
    printf ( "\n   Address of offmask\t= %u", &b.offmask );
    printf ( "\n   Address of off\t= %u", &b.off);
    printf ( "\n   Address of offoff\t= %u", &b.offoff);
    printf ( "\n   Address of hoff\t= %u", &b.hoff);
    printf ("\n");
    printf("size of tc_u32_sel_mod1 in bytes : %d\n", 
            sizeof(c));
    printf ( "\n   Address of flags\t= %u", &c.flags);
    printf ("\n");
    return 0;
}

The fact that many structs in pkt_sched.h and pkt_cls.h have explicit pad fields, and these dont is rather confusing.

For my current go-project this erroneous hack is enough for the moment:

func validateU32SelOptions(info *U32Sel) ([]byte, error) {
    buf := new(bytes.Buffer)
    binary.Write(buf, nativeEndian, info.Flags)
    binary.Write(buf, nativeEndian, info.Offshift)
    binary.Write(buf, nativeEndian, info.NKeys)
    binary.Write(buf, nativeEndian, info.OffMask)
    binary.Write(buf, nativeEndian, info.Off)
    binary.Write(buf, nativeEndian, info.Offoff)
    binary.Write(buf, nativeEndian, info.Hoff)
    // fix alignment
    buf.WriteByte(0x00)
    binary.Write(buf, nativeEndian, info.Hmask)
    for _, v := range info.Keys {
        data, err := marshalStruct(v)
        if err != nil {
            return []byte{}, err
        }
        buf.Write(data)
    }
    return buf.Bytes(), nil
}

netlink traffic hex dumps from strace:

GO app [{{nla_len=56, nla_type=TCA_OPTIONS}, "\x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\x0a\xc8\x0c\x0c\x10\x00\x00\x00\x00\x00\x00\x00\x08\x00\x01\x00\x01\x00\x01\x00\x08\x00\x0b\x00\x01\x00\x00\x00"}, {{nla_len=8, nla_type=TCA_KIND}, "u32"}]}, iov_len=100}]

Go app, no fix [{{nla_len=56, nla_type=TCA_OPTIONS}, "\x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\x0a\xc8\x0c\x0c\x10\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x01\x00\x01\x00\x01\x00\x08\x00\x0b\x00\x01\x00\x00\x00"}, {{nla_len=8, nla_type=TCA_KIND}, "u32"}]}, iov_len=100}]

strace -v -s 64 tc filter add dev eth1 protocol ip parent 1: prio 10 u32 match ip src 250.206.254.237 flowid 1:10 2>&1 |grep RTM_NEWTFILT

[{{nla_len=8, nla_type=TCA_KIND}, "u32"}, {{nla_len=56, nla_type=TCA_OPTIONS}, "\x08\x00\x01\x00\x10\x00\x01\x00\x08\x00\x02\x00\x00\xd0\x27\x00\x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xfa\xce\xfe\xed\x0c\x00\x00\x00\x00\x00\x00\x00"}]}, iov_len=100}]

Interesting part: TC: \x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xfa\xce\xfe\xed Go fixed: \x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xfa\xce\xfe\xed Go, not fixed: \x24\x00\x05\x00\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xfa\xce\xfe\xed

In the current state of this libarary padding is added only after the struct data which conforms to the mod4 of netlink messages. All the above netlink messages are 100bytes long because of this.

Edit: Couldnt let this go, so I ended up looking at some tc and libnetlink code. Below is an exerpt from tc source tc/f_u32.c.

f_u32.c:1183

    if (sel_ok)
        addattr_l(n, MAX_MSG, TCA_U32_SEL, &sel,
              sizeof(sel.sel) +
              sel.sel.nkeys * sizeof(struct tc_u32_key));

Looks like indeed the struct is copied as is with padding to the send buffer. I dont know yet whether this is common practice, but I guess it makes sense in a level where both tc binary and the kernel are compiled on the same architecture.

man 3 libnetlink:

       addattr_l
              Add a variable length attribute of type type and with value
              data and alen length to netlink message n, which is part of a
              buffer of length maxlen.  data is copied.

addattr_l would then proceed to copy sizeof(struct tc_u32_sel) bytes, which would include the padding.

florianl commented 4 years ago

Hi! I had a closer look again and dived into the source of tc. And you're right there is an alignment applied. With the latest change in https://github.com/florianl/go-tc/commit/412a139fd321f52d8d87cb61a2ad807c13723464 the padding has been added in handling the U32Sel struct.

When testing with the filter you have provided, I get the following, which looks fine I think.

&tc.U32Sel{Flags:0x1, Offshift:0xde, NKeys:0x1, OffMask:0x1122, Off:0x3344, Offoff:0xccdd, Hoff:0xeeff, Hmask:0xddccbbaa, Keys:[]tc.U32Key{tc.U32Key{Mask:0x44332211, Val:0xedfecefa, Off:0xcc, OffMask:0x0}}}

Thanks again for reporting it!

florianl commented 4 years ago

Closing this issue as of no further response. Don't hesitate to reopen it, if needed or create a new one.