Closed htdvisser closed 4 years ago
We may have to expose some extra information if the GS has multiple UDP endpoints that may have different behavior (different frequency plan for example). This can be discussed later, as I think this is a pretty advanced use case.
I think we should have an idea for this because we'll use this quite in PCN. But do we need those endpoints as SRV records in the first place?
But do we need those endpoints as SRV records in the first place?
Do we need those endpoints to be registered in the first place: we do need to have a port mapping if non-default ports are used. Sure, we could specify that default port numbers are used if SRV records aren't present, but this would enable (encourage/not punish) client/sdk developers that take shortcuts by assuming that default ports are always used, which would result in those clients/sdks not supporting clusters that use non-default ports.
Do we need those endpoints as SRV records: No, we can also just register all endpoints in the Identity Server's Entity Registry, but then it becomes a CPOF like v2's Discovery Server. Or define a gRPC or HTTP endpoint (that is always on 80/443) for listing the ports for the services.
@johanstokking Let's revive this issue. As discussed, we should determine to what extent we can and want to align this with the LoRa Alliance DNS for JoinEUIs and NetIDs.
The LoRaWAN Backend Interfaces 1.0 specification (which I assume we still plan to support) specifies that for a JoinEUI of 00005E100000002F
a server shall do a NAPTR lookup on f.2.0.0.0.0.0.0.0.1.e.5.0.0.0.0.joineuis.lora-alliance.org
.
The result of this lookup is:
order | pref | flags | service | regexp | replacement | ||
---|---|---|---|---|---|---|---|
IN | NAPTR | 50 | 50 | S | LWN | _lwn.operator.com | |
IN | NAPTR | 90 | 50 | S | LWNS | _lwn.operator.com |
The flags indicate the next lookup to perform:
flag | next action |
---|---|
S | SRV lookup of _lwn.operator.com |
A | A, AAAA or A6 lookup of _lwn.operator.com |
U | Use _lwn.operator.com as URI |
P | Additional NAPTR lookup |
The service means:
service | meaning |
---|---|
LWN | LoRaWAN server using HTTP |
LWNS | LoRaWAN server using HTTPS |
I'm assuming that the LoRa Alliance isn't going to operate/administer all of those DNS records, so I think we'll get an NS record for our EUI prefix. This likely means that we will control 5.d.3.b.0.7.joineuis.lora-alliance.org
and have to operate/administer it. I don't think it's a good idea to do this in our existing DNS server, so I looked into self-hosting authorative DNS servers. It turns out to be relatively easy to implement with github.com/miekg/dns which is also used by coredns (k8s) and skydns/consul (which we can also look into).
Since we're already planning to register clusters in the identity server (#143) I think it would be a good idea to add a list of JoinEUI prefixes to this registration. Our DNS servers could then periodically fetch the full list of clusters, their addresses and JoinEUI prefixes, and serve DNS records for these. We could even consider skipping this extra component and exposing DNS directly on the Identity Servers.
The DNS lookup described in the LoRaWAN Backend Interfaces specification ends with an IP address of the Join Server for a JoinEUI. I don't think this is sufficient for us, since we (1) want to use gRPC instead of the Backend Interfaces API and (2) need to know a port number to connect to. In my opinion the perfect way to publish this kind of information using DNS is to use SRV records as described in the original issue above.
The LoRaWAN Backend Interfaces 1.0 specification (which I assume we still plan to support) specifies that for a JoinEUI of
00005E100000002F
a server shall do a NAPTR lookup onf.2.0.0.0.0.0.0.0.1.e.5.0.0.0.0.joineuis.lora-alliance.org
.
NAPTR records are dropped in 1.1. That version is still in draft until members start implementing it (fully, including hand-over roaming) and confirm that they did not encounter issues. So that's going to take a while. Until then, we should focus on 1.1 and not spend time on functionality that we know becomes obsolete.
On top of that, as far as I know, members make little to no use of DNS lookup at the moment, partly because of this complexity, partly because they have out-of-band agreements anyway and partly because of little technical DNS support from the LoRa Alliance.
I'm assuming that the LoRa Alliance isn't going to operate/administer all of those DNS records, so I think we'll get an NS record for our EUI prefix. This likely means that we will control
5.d.3.b.0.7.joineuis.lora-alliance.org
and have to operate/administer it.
DNS delegation is one of the topics that is continuously being pushed forward. So far, what's in draft 1.1, is CNAME and A records only. Let's not bring any assumptions to the equation at this moment.
Since we're already planning to register clusters in the identity server (#143) I think it would be a good idea to add a list of JoinEUI prefixes to this registration. Our DNS servers could then periodically fetch the full list of clusters, their addresses and JoinEUI prefixes, and serve DNS records for these. We could even consider skipping this extra component and exposing DNS directly on the Identity Servers.
The DNS lookup described in the LoRaWAN Backend Interfaces specification ends with an IP address of the Join Server for a JoinEUI. I don't think this is sufficient for us, since we (1) want to use gRPC instead of the Backend Interfaces API and (2) need to know a port number to connect to. In my opinion the perfect way to publish this kind of information using DNS is to use SRV records as described in the original issue above.
I suggest going with the following phases;
NAPTR records are dropped in 1.1. [...] Until then, we should focus on 1.1 and not spend time on functionality that we know becomes obsolete.
So does this mean we will not implement 1.0 at all?
DNS delegation is one of the topics that is continuously being pushed forward. So far, what's in draft 1.1, is CNAME and A records only. Let's not bring any assumptions to the equation at this moment.
Then I guess I was confused by the spec mentioning NS records:
The NetID will be provisioned in the zone “NETIDS.lorawan.net”. The resource corresponding to the NetID could be provisioned in different DNS resource record formats (such as NS, CNAME, A, AAAA).
[...]
Similarly, the JoinEUI could be provisioned in the zone “JOINEUIS.lorawan.net” with different DNS resource record formats based on the requirements as follows:
I suggest going with the following phases [...]
I thought we previously already concluded that the first phase for the Backend Interfaces Join flow would be a configuration file or repository. I came up with something like this:
- name: Name of the Join Configuration
prefixes:
- 0000000000000000/00
- 0000000000000000/00
# in case of DNS lookup:
dns:
resolver: 1.1.1.1
records: CNAME
# in case of static config:
static:
host: hostname.tld
port: 1234
protocol: ttn.lorawan-stack.v3 # or backend-interfaces-1.0, backend-interfaces-1.1, ...
# in case of basic auth:
basic_auth:
username: username
password: password
# in case of token auth:
bearer_token: XXX
# in case of TLS:
tls_config:
ca_file: ...
cert_file: ...
key_file: ...
We are however getting a bit off-topic for this issue. The goal of this issue is to come up with a mechanism for cluster discovery and for getting port+protocol configuration from domain name of a cluster deployment, so that (for example) the network_server_address
of an end device registration can be resolved to the gRPC or HTTP endpoint of the Network Server.
I think it would be nice if we can do this in DNS and if it can be aligned with the DNS mechanism that is described in the Backend Interfaces spec. But I can also just start implementing all of this as RPCs in the Identity Server while we figure out if and how we want to expose this through DNS.
So does this mean we will not implement 1.0 at all?
We cherry pick from Backend Interfaces like other members. We don't do hand over roaming, we do (stateless) passive roaming in Packet Broker, we don't do 1.0 NAPTR records, we do 1.1 DNS lookup, we do support 1.0 and 1.1 messages for the flows that we implement, etc.
Then I guess I was confused by the spec mentioning NS records:
The NetID will be provisioned in the zone “NETIDS.lorawan.net”. The resource corresponding to the NetID could be provisioned in different DNS resource record formats (such as NS, CNAME, A, AAAA). [...] Similarly, the JoinEUI could be provisioned in the zone “JOINEUIS.lorawan.net” with different DNS resource record formats based on the requirements as follows:
DNS delegation is certainly on the roadmap and 1.1 opens the door for it, but we don't have to operate/administer it (for now) so we don't have to set that all up. In practice, there's no support for it from LoRa Alliance nor Afnic (yet), so even if we would have that in place, we can't use it.
I thought we previously already concluded that the first phase for the Backend Interfaces Join flow would be a configuration file or repository. I came up with something like this [...]
Yes, that fits nicely with my phases 1 and 2 and should be part of #833 (cc @rvolosatovs)
We are however getting a bit off-topic for this issue. The goal of this issue is to come up with a mechanism for cluster discovery and for getting port+protocol configuration from domain name of a cluster deployment, so that (for example) the
network_server_address
of an end device registration can be resolved to the gRPC or HTTP endpoint of the Network Server.I think it would be nice if we can do this in DNS and if it can be aligned with the DNS mechanism that is described in the Backend Interfaces spec. But I can also just start implementing all of this as RPCs in the Identity Server while we figure out if and how we want to expose this through DNS.
I just don't think it should be aligned to Backend Interfaces, if we go for the DNS approach.
Also, in (private) networks we need this cluster discovery as well, and it's going to be pretty hard to impose DNS there knowing some of their enterprise environments. So making this part of IS (and potentially keeping DNS records from there) may be the best way to go.
Blocked on #143
There's some groundwork in #1392 to at least fallback to the default ports if the target doesn't contain any.
The middleware introduced in pkg/rpcmiddleware/discover
is intended to contain the implementation for this issue.
The proposal here contains SRV records per component, which means that discover.WithTransportCredentials()
and discover.WithInsecure()
should take a ttnpb.ClusterRole
. I did not account for this yet, but this is not hard to add. It's just that callers may not reuse connections anymore and keep them separate per component. As long as we don't have a final solution with service discovery per component in place and we know exactly what we want, let's not prematurely account for that.
Additional groundwork in #1442 is the pkg/rpcmiddleware/discover.DialContext()
that is going to discover services on the target and dial the right address with the right dial options.
We may want to consider adding all default dial options there, instead of requiring it to be set by the callers. This makes it also easier to make them variable based on discovered result.
WARNING: 2020/06/19 21:59:32 grpc: addrConn.createTransport failed to connect to {0.0.0.0.0.0.0.d.e.7.5.d.3.b.0.7.join.thethings.industries <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for join.cloud.thethings.industries, *.join.cloud.thethings.industries, not 0.0.0.0.0.0.0.d.e.7.5.d.3.b.0.7.join.thethings.industries". Reconnecting...
I'll try figuring out a way to do the SRV lookup not in the dialer but before that, hopefully as an alternative dial option, otherwise before dialing. Validating the peer certificate using another SRV lookup doesn't seem like a good option.
Closed by #2779
Summary:
It would be really nice if we could do some kind of cluster/service discovery.
Why do we need this?
What is already there? What do you see now?
What is missing? What do you want to see?
Assuming that clusters will be registered in the Identity Server, we either need to put service information in there, or we need a different mechanism to discover service information.
How do you propose to implement this?
In https://github.com/TheThingsIndustries/lorawan-stack/issues/1131 I suggested to use DNS SRV records for service discovery.
A TTN cluster
eu-west.thethings.network
could be discovered throughSRV
records in DNS:_ttn-v3-{gs,ns,as,js}-{grpc,http,mqtt}._{tcp,tls}.eu-west.thethings.network.
<ttl>
IN
<priority>
<weight>
8884
/1884
/443
/80
/8883
/1883
/...Note that in this example, the SRV records only indicate a port mapping for the services exposed by the cluster (so these records will only have to be set once for a deployment). The target
eu-west.thethings.network
is assumed to be a load balancer in front of the cluster. Alternatively we could use the SRV records for load balancing.We may have to expose some extra information if the GS has multiple UDP endpoints that may have different behavior (different frequency plan for example). This can be discussed later, as I think this is a pretty advanced use case.
What can you do yourself and what do you need help with?
Let's first think about cluster registration in the Identity Server: #143