golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

proposal: net: reconsider representation of IP #18804

Closed bradfitz closed 2 years ago

bradfitz commented 7 years ago

A net.IP is currently a []byte, which has the bad properties of:

Reconsider for Go 2.

jhorowitz commented 7 years ago

I maintain an IP manipulation lib that relies heavily on ips being a byte array. Out of curiosity, what would the alternative be? Also, how can I make sure that I can give input on / contribute to the new implementation?

bradfitz commented 7 years ago

@jhorowitz, when the time comes, this will be the place. Just stay subscribed.

We're not soliciting design ideas or feedback yet.

jhorowitz commented 7 years ago

@bradfitz Thanks!

mdlayher commented 6 years ago

As much as I appreciate that the net.IP type can store both IPv4 and IPv6 addresses, I find the API awkward to use and the need to check for nil repeatedly annoying.

I almost wonder if it would make sense to make net.IP an interface in Go 2, with different IPv4 and IPv6 implementations. The "IPv4 mapped IPv6" address that net.ParseIP produces with an IPv4 address is almost certainly not what the caller wants.

As previously mentioned, the need to check for nil instead of an ok Boolean to determine if an operation was successful (ParseIP, To4, To16) is also unusual and not Go-like.

I am mobile at the moment but managed to find this issue, and wanted to leave some thoughts while they were fresh in my mind. I would be happy to go into more detail if needed.

mikioh commented 6 years ago

I guess what you really want is some help for dealing with IPv6 Addressing of IPv4/IPv6 Translators defined in RFC 6052 instead of IPv4 mapped IPv6 addresses on circumstances using XLAT, and more help for IPv6 Address Synthesis described in RFC 7050.

Is my understanding correct? I'm fine to make IP address handling API more flexible with user-supplied context; I'm a XYZ-mo user, here I have a translation prefix via some mechanism such as PvD: thus any IP address comes from the prefix must be considered as an IPv4 address.

mdlayher commented 6 years ago

Sorry for the delay, forgot to reply sooner.

I'm a little confused by the terminology you've used (and am still fairly new to IPv6 as a whole), but here's what I mean:

https://play.golang.org/p/2xleMiTvFIW

I don't understand why these functions would default to this 16 byte representation, when it's almost certainly not what the caller would want. Mapping IPv4 addresses like this seems like an advanced use case that I wouldn't think the stdlib should need to handle. Of course, I could be missing something obvious.

This has bitten me a few times in the past, and now I find myself creating small helpers to check things like "is X bytes or Y string actually an IPv4 or IPv6 address"?

https://play.golang.org/p/Udt8YQdO6f2

It seems like it'd be less error prone if they were two separate types, instead of having to make sure you remember to call "To4" and "To16" appropriately, and remember what a nil return value means for both.

bradfitz commented 5 years ago

Concretely, I want the representation to be like a time.Time: a small, opaque, immutable value type. It could then be copied and retained at will, like a Time.

We could add methods for the required operations.

bradfitz commented 5 years ago

@mdlayher, I presume from your thumbs-up on the prior comment that you're cool with a Time-like representation rather than an interface as you previously proposed?

mdlayher commented 5 years ago

I think I'd really just like a better way to disambiguate between IPv4 and IPv6 addresses. An interface seemed like a decent way to do that, but I'd be curious to see what kind of API you have in mind.

jhorowitz commented 5 years ago

@mdlayher Do you find calling ip.To4 != nil to be too cumbersome or have you found the reslicing to be a performance bottleneck? I do agree the method is not super intuitive. In terms of the 16 byte representation, IPv6 reserves part of it's IP space to represent IPv4 addresses. If an IPv6 address has bytes 0-9 as zeros and 10-11 as 255 then the remaining 4 bytes represent an IPv4 address. I'm not sure why it defaults to the 16 byte representation but I have found that having that default makes comparisons easier when mixing IPv4 and IPv6 together.

@bradfitz What do you think of unifying IP and IPNet into one structure?

mdlayher commented 5 years ago

Apologies for the delay.

Do you find calling ip.To4 != nil to be too cumbersome or have you found the reslicing to be a performance bottleneck? I do agree the method is not super intuitive.

Yep, it's really non-obvious IMHO, and too easy to make silly mistakes like passing an IPv4 address as bytes to some system while stored in its packed, 16-byte IPv6 mapped representation.

In terms of the 16 byte representation, IPv6 reserves part of it's IP space to represent IPv4 addresses. If an IPv6 address has bytes 0-9 as zeros and 10-11 as 255 then the remaining 4 bytes represent an IPv4 address. I'm not sure why it defaults to the 16 byte representation but I have found that having that default makes comparisons easier when mixing IPv4 and IPv6 together.

I'm just not really sure why the standard library does this at all. Can't say I've ever come across a system where this representation was used. Does it even need to be supported by the standard library? (Honest question, I'm really not sure!) I suppose it depends on what the "Go 2" representation is like.

Even if it remains, I do feel strongly that net.IPv4 should return the 4-byte representation, as that's almost certainly what the caller wants.


To further reply to Brad's thread:

I presume from your thumbs-up on the prior comment that you're cool with a Time-like representation rather than an interface as you previously proposed?

@bradfitz have any ideas in mind for this yet? Were you thinking about something like:

type IP struct {
    high uint64
    low  uint64
}

... and then using this to represent both IPv4/6 addresses (IPv4 using the 16-byte mapped representation)? I'm okay with this, but I would argue in favor of more obvious methods of obtaining an IPv4 versus IPv6 address that don't require strange nil checks like the current To4 and To16 methods.

Here's a helper of sorts I usually end up copying and pasting between packages:

// checkIPv6 verifies that ip is an IPv6 address.
func checkIPv6(ip net.IP) error {
    if ip.To16() == nil || ip.To4() != nil {
        return fmt.Errorf("ndp: invalid IPv6 address: %q", ip.String())
    }

    return nil
}

... and I'd love to not have to do that. :)

bradfitz commented 5 years ago

Were you thinking about something like:

Something like that, or an array. But that can change over time (as time.Time has) so I'm more interested in discussing:

// Copy16 copies 16 bytes into dst. It panics if dst is not at least 16 bytes. If ip is an IPv4 address, its IPv4-in-IPv6 representation is copied, lorem ipsum .... 
func (ip IP) Copy6(dst []byte)
mdlayher commented 5 years ago

Sorry, hit some keyboard shortcut. Typing up a more coherent response...

mdlayher commented 5 years ago

My vote is to simplify the existing API surface and move things that 95% of folks won't need to x/net or outside the stdlib. I realize I'm proposing some big changes, and I don't know if this is something folks are willing to consider, but here are my thoughts:


a list of methods we'd have on an IP

I'm not totally convinced all of the existing IsX (IsGlobalUnicast, IsLinkLinkUnicast, etc.) methods are necessary in the stdlib. Maybe it'd be best to stick to the simplest ones, like IsLoopback, IsMulticast, and IsUnspecified. I don't feel strongly about this, but I generally think small API surfaces are nicer.

DefaultMask feels unnecessary in stdlib especially because it only applies to IPv4. I'd vote x/net/ipv4 or remove.

I would be curious to see how much MarshalText and UnmarshalText are actually used in the wild; I've pretty much exclusively used ParseIP and IP.String. Maybe these are used in stdlib for optimizations?

So in theory, I wonder if we could get away with just:

... and leave any other address classification logic to x/net or outside stdlib.

could we get rid of the existing Equal(IP) bool method? Would == be sufficient?

I think retaining Equal is a decent idea so that the internal representation can change later on if needed, but since you explicitly mentioned the goal of keeping it small and such, perhaps that escape hatch is no longer necessary.

If so, that means IP to 1.2.3.4 vs an IP to ::ff:ff:1.2.3.4 are not ==, so:

This has personally bitten me a few times in some code that was shipping around IP addresses as bytes in protobufs. I'd be okay with calling these two representations "not equal".

which Is4() bool / Is6() bool / ToMapped6() IP / etc APIs would be sufficient?

I like the explicitness of the first two methods. I personally would argue against supporting the notion of the mapped IPv4 in IPv6 representation unless there's some major use case for it that I'm not aware of.

or can we say that an IP is never in IPv4-in-IPv6 mapped form and if you want that, you need to do, say:

Feels like an x/net or outside of stdlib thing to me IMHO.

bradfitz commented 4 years ago

FYI: I've started implementing this at https://github.com/inetaf/netaddr because I needed it.

Lewiscowles1986 commented 3 years ago

Wrong repo (tab hell) apologies.

jzelinskie commented 3 years ago

I implemented a BitTorrent tracker a few years ago that supports both IPv4 and IPv6 with about zero understanding of IPv6 and that experience has given me some thoughts on net.IP. This post doesn't offer any new suggestions, but rather is my own anecdote just affirms some of problems and suggestions mentioned in this thread.

Despite there being a separate BitTorrent standard for IPv6, I didn't originally implement it. At first, I naively thought that by using net.IP, I unlocked support both IP stacks for free. Surprisingly, this was somewhat true--but only by dumb luck--that machines with both network stacks and lenient BitTorrent clients will chug along just fine with this design. Eventually I went to comply with the standard, but the idea that the standard library "took care of it for me" is surely a pitfall for newcomers.

Because of IPv4 addresses can be mapped into IPv6 and because IPv4 is still more ubiquitous than IPv6, I am forced to always check if ip := peer.IP.To4(); ip != nil { ... } at the network boundaries, which as @mdlayher described, is code without obvious intent and error-prone. So that I guarantee this action was performed, I created my own IP type that embeds a net.IP and an enumeration for the address family:

type IP struct {
    net.IP
    AddressFamily
}

Supporting both IP stacks in a given application often requires you checking the address family for high-level logic and not just at network boundaries. Even the storage layer of my program has to take into account address family so that we can fetch data for responses quickly. At one point, I had code that optimized responses not only based on IP stack, but also subnet. I mention this because I do not believe that a network library can hide all of the details of IP, so that should be a non-goal and obvious to the consumer of the library.

Speaking of the storage layer, my program's memory store is a giant map, which brings up the []byte can't be map keys problem. We actually end up serializing a bunch of data into a string and using that as the key. It costs us some CPU to serialize/deserialize, but the win is that the garbage collector can skip the maps entirely as there are no pointers to follow. This also avoids us needing string interning like what's used in inetaf/netaddr.

We have to defensively do append([]byte{}, IP) in a couple places to guarantee that we aren't using a reference to the net.IP because IP is mutable and a reference.

tv42 commented 3 years ago

@jzelinskie The compiler optimizes []byte to string conversions for map lookups:

m := make(map[string]T)
b := []byte{42, 13}
m[string(b)] = T{...}
t := m[string(b)]
jzelinskie commented 3 years ago

@tv42 Yep -- I've used this trick in the past. Our keys are now composed of unique id, a port, and an IP: https://github.com/chihaya/chihaya/blob/ff0fe9e28df467d30600d0244d7f9b93d2b6d41f/storage/memory/peer_store.go#L173-L180

jxsl13 commented 3 years ago

good proposal. I find the approach to use a byte slice interesting, but seemingly really error prone and also not really convenient to use. Would be interesting to see a Pythonesque ipaddress library, as networking is a major part of Go.

ecbaldwin commented 2 years ago

Late to the game here as I went mostly oblivious to the addition of netip to Go 1.18 until now. Makes me a little sad but that is on me.

I strongly believe that v4 and v6 should have their own types and packages (they are more different than they are the same!). I also needed a routing table implementation. If you are with me on either of these, you might look at the library that I have been working on. The ipv4 package is pretty complete, clean, and working very well but the documentation is a WIP. The ipv6 package is also a WIP but is close to being finished. I am hoping for a v1 release in the next month. It will also have an ip package to provide interfaces and helpers for contexts where ipv4 and ipv6 can be lumped together.

mdlayher commented 2 years ago

With the addition of net/netip in Go 1.18, should we close this out? Or relabel and retitle that net/netip should become the default representation of net.IP for a hypothetical Go 2.x?

ianlancetaylor commented 2 years ago

Thanks for pointing this out. I'm going to close this issue. We can start a new issue if there are still concerns after we have some real experience with net/netip.