array to slice with generics, array type/length specialization with generics

zx2c4 commented 2 years ago

I've got a trie structure that works over 4 byte arrays and 16 byte arrays, for IPv4 and IPv6 respectively. I used to just use a []byte slice for this, and adjust accordingly based on len(x) when it mattered.

Actually, pretty much the only place it mattered was here:

func commonBits(ip1, ip2 []byte) uint8 {
    if len(ip1) == 4 {
        a := binary.BigEndian.Uint32(ip1)
        b := binary.BigEndian.Uint32(ip2)
        x := a ^ b
        return uint8(bits.LeadingZeros32(x))
    } else if len(ip1) == 16 {
        a := binary.BigEndian.Uint64(ip1)
        b := binary.BigEndian.Uint64(ip2)
        x := a ^ b
        if x != 0 {
            return uint8(bits.LeadingZeros64(x))
        }
        a = binary.BigEndian.Uint64(ip1[8:])
        b = binary.BigEndian.Uint64(ip2[8:])
        x = a ^ b
        return 64 + uint8(bits.LeadingZeros64(x))
    } else {
        panic("Wrong size bit string")
    }
}

So in converting this all away from slices and toward static array sizes, I made this new type constraint:

type ipArray interface {
    [4]byte | [16]byte
}

Then I broke out those two if clauses into their own functions:

func commonBits4(ip1, ip2 [4]byte) uint8 {
    a := binary.BigEndian.Uint32(ip1[:])
    b := binary.BigEndian.Uint32(ip2[:])
    x := a ^ b
    return uint8(bits.LeadingZeros32(x))
}

func commonBits16(ip1, ip2 [16]byte) uint8 {
    a := binary.BigEndian.Uint64(ip1[:8])
    b := binary.BigEndian.Uint64(ip2[:8])
    x := a ^ b
    if x != 0 {
        return uint8(bits.LeadingZeros64(x))
    }
    a = binary.BigEndian.Uint64(ip1[8:])
    b = binary.BigEndian.Uint64(ip2[8:])
    x = a ^ b
    return 64 + uint8(bits.LeadingZeros64(x))
}

So far, so good, but what is the implementation of commonBits?

func commonBits[B ipArray](ip1, ip2 B) uint8 {
    // ???
}

If you try to convert the array to a slice, the compiler will bark at you. You can use any(ip1).(type) in a switch, but then you get runtime overhead. I've figured out a truly horrific trick that combines two Go 1.17 features into an unholy mess:

func giveMeA4[B ipArray](b B) [4]byte {
    return *(*[4]byte)(unsafe.Slice(&b[0], 4))
}

func giveMeA16[B ipArray](b B) [16]byte {
    return *(*[16]byte)(unsafe.Slice(&b[0], 16))
}

func commonBits[B ipArray](ip1, ip2 B) uint8 {
    if len(ip1) == 4 {
        return commonBits4(giveMeA4(ip1), giveMeA4(ip2))
    } else if len(ip1) == 16 {
        return commonBits16(giveMeA16(ip1), giveMeA16(ip2))
    }
    panic("Wrong size bit string")
}

This... works, amazingly. Similarly, when I needed to adjust my randomized unit tests, I wound up going with code that looks like this:

var addr B
rand.Read(unsafe.Slice(&addr[0], len(addr)))

I asked some Go experts if there was a better way, and the answer I got was that generics aren't yet well suited for arrays. So, I'm opening this rather vague report in hopes that it can turn into a proposal for something useful.

CC @danderson @sebsebmc @josharian

ianlancetaylor commented 2 years ago

I think that at a high level you are looking for something like C++ template specialization: a way to write an implementation of a generic function to use for a specific type. The current generics support in Go does not provide a way to do that, which is intentional (https://go.googlesource.com/proposal/+/refs/heads/master/design/43651-type-parameters.md#omissions).

Please let me know if I misunderstand.

That said, if using any(ip1).(type) in a type switch does introduce runtime overhead (I haven't checked) we can consider compiler optimizations for that case. For example perhaps a type switch on a generic argument with a small number of cases should be used as an indication that we should stencil out those cases rather than using a dictionary. (And of course more generally if the constraint(s) only permit a couple of types we could consider stenciling out those types always.)

That is, perhaps we can approach this as a compiler optimization issue rather than as a language issue.

zx2c4 commented 2 years ago

The other funny aspect of this is that you can't slice the type, even though you can call len(x) on it.

type ipArray interface {
    [4]byte | [16]byte
}
func legal[B ipArray](x B) {
    _ = len(x)
}
func illegal[B ipArray](x B) {
    _ = x[:]
}

I'm not sure I understand why len(x) would be okay but x[:] would not. It's using the same information in both cases to derive the result. I wind up hacking around that using the unsafe.Slice trick above, but I don't quite see why that should be necessary?

ianlancetaylor commented 2 years ago

The inability to slice is a deficiency of the current implementation, not a part of the language design. There are a number of similar infelicities in 1.18.

zx2c4 commented 2 years ago

Oh that's good news. So it sounds like we've identified two compiler improvements that might actually be actionable without having to have a complicated language proposal:

1) Optimizing switch x := any(t).(type) to not require any runtime overhead for reasonably contoured cases. 2) Allow slicing types without having to resort to unsafe.Slice to force it.

The first would help with a sort of poorman's type specialization via a dispatcher that could disappear at compile time. This would help with the generic case.

The second would help with array specialization, among other things, which could also disappear at compile time depending on the context.

That seems like a straightforward way to address this, right?

ianlancetaylor commented 2 years ago

Sounds plausible to me.

davidmdm commented 2 years ago

This may be a bit naive of me, but if this proposal was accepted: https://github.com/golang/go/issues/45380 (type switch on parametric types) wouldn't this solve this issue?

zx2c4 commented 2 years ago

That's similar but it's a language proposal, right? The above is just about adding some small compiler optimization to accomplish the same thing.

karalabe commented 4 days ago

Kind ping, any hope of getting this fixed/implemented? Please see commented out line for what should work but doesn't.

https://go.dev/play/p/msqldSEx6yt

golang / go

array to slice with generics, array type/length specialization with generics #51740