Open ederuiter opened 4 months ago
I think the real issue is that the UDP protocol might be used in a different way from how it is intended to be used.
Each gateway must periodically send a PULL_DATA
packet to keep the UDP route open (NAT and / or firewall). The normal PULL_DATA
interval is 10 seconds. See also: https://github.com/Lora-net/packet_forwarder/blob/master/PROTOCOL.TXT#L289
This PULL_DATA
is what is updating the state of the connection within the ChirpStack Gateway Bridge:
https://github.com/chirpstack/chirpstack-gateway-bridge/blob/master/internal/backend/semtechudp/backend.go#L331
If Helium is not sending PULL_DATA
packets, then I think it is the correct behavior that the ChirpStack Gateway Bridge invalidates the UDP connection. Due to the nature of UDP, you probably do not want to set the gatewayCleanupDuration
to a very high value. A quick search on "udp nat timeout" and "udp firewall timeout" gives me numbers like 30 - 60 seconds.
Hmm, yeah the helium packet routers basically impersonate each gateway. I think it uses unique port numbers for each gateway so it can identify which gateway the packets are destined for. But I agree it is a bit of misuse of the protocol, but I also can see that it would not be feasible for the helium packet routers to maintain active udp connections to each lns for each gateway that has received a packet for one of their devices (with ~400.000 active gateways)
In this specific case (helium) we know that the ip:port of the gateways are (mostly) stable, as they don't refer to each gateway, but to the helium packet routers which have public ip's and are not behind nat/firewall etc. We could still have some timeout issues with loadbalancing on our side, but that is something we can (hopefully) tweak.
Would you accept a PR to make this setting configurable? That would allow us to easily work around this without having to recompile the gateway-bridge. I have also brought up this issue on the helium side and are also discussing on what they can do from their side.
NB: yes there are other options than semtech udp to connect chirpstack LNS's to helium 1) via lorawan roaming => unfortunately chirpstack only support roaming V1.0; and helium uses V1.1 2) via the packet router (helium specific protocol) => this is probably the best option, but I am unsure of the support of this from the helium side; currently looking into this
Happy to accept a PR to make this option configurable (with as default the current value), so that this can be adjusted.
With regards to a proper solution, I agree that 2. is probably the best option. E.g. we could create a chirpstack-helium-bridge
that integrates with the Helium API and transforms the data into the ChirpStack MQTT format. I like 2) over 1) because it makes the architecture simpler and easier to debug. The roaming API makes things a lot more complex to debug + there are no inbound connections required from Helium > ChirpStack.
:+1: :100: agree
You can expect a PR from me/a colleague of mine to add gatewayCleanupDuration
to the configuration in the next couple of days.
For the chirpstack-helium-bridge
: helium already has this: https://github.com/helium/helium-packet-router-ingest which does convert from packet router to gwmp/http roaming. Should be easy enough :tm: to use this as a basis for it.
This would simplify deployments of helium lns's with chirpstack a lot; I will talk to helium about this and see how we can expedite this
What happened?
Sometimes downlinks for class C devices are not sent directly, they are delivered minutes later after we received another uplink.
What did you expect?
Since the idea of class C devices is that downlinks can be sent at any time; I would expect not to have to wait until we receive an uplink.
Details
We use chirpstack to connect to the helium network. This means gateways don't maintain an active connection to chirpstack; instead we only get a packet when an uplink is sent from a device. This leads to issues as currently chirpstack-gateway-bridge clears the gateway address information after 1 minute of inactivity ( https://github.com/chirpstack/chirpstack-gateway-bridge/blob/master/internal/backend/semtechudp/registry.go#L21 ) So when we want to send a downlink to a device we would need to do that within a 1-2 minute timeframe of the last uplink of the gateway; otherwise the address of the gateway is cleared and the downlink cannot be sent. This is not an issue for class A/B devices as they are required to send their downlinks within that timeframe .. but for class C devices this leads to issues
For now we have deployed a local fix that increases the
gatewayCleanupDuration
to 24h to migite this issue. But this is not a permanent fix.Ideally the gateway address information also needs to be persisted and shared among all instances of the gateway-bridges for this region, as otherwise a reboot, or loadbalancing could lead to the same issues.
Could you share your log output?
Your Environment
PS: Happy to help putting together a PR, but let's first figure out how to approach this