Closed adriansmares closed 2 years ago
We would like to keep the end device session in the source v3 cluster
Why?
We would like to keep the end device session in the source v3 cluster
Why?
This issue is strictly about situations in which the gateway cannot be moved from TTSCE to TTSC. Since the device address prefixes differ, you cannot use PB for roaming. In this interval of item between the re-registration and re-join, uplinks would be lost if we do not keep the session in the source v3 cluster. This issue describes how to basically have the old session in the source v3 cluster for 'uplink only', while ensuring that the target v3 cluster will be the one answering the eventual Join Request.
I see. LoRa Alliance has device migration between networks also on the radar now, and session migration is one of the apporaches. It would (only) work with (temporarily) routing traffic to both the old and the new destination.
That said, should we consider these two alternative approaches?
ZRANGE WITHSCORES
and one HGETALL
per session (current and pending). In TTSCE eu1
, the current rate is a bit over 400 pps, so 1600 Redis commands extra per second in TTSC eu1
and eu2
. This will align best with LoRa Alliance comes up with(1) and (2) can be complementary
We do not have a good experience with the word 'temporary' when it comes to migrations. Once the genie is out of the bottle, I don't think that we can turn off the traffic rule with ease, and this becomes debt. I'm ok with this kind of debt, but I think we should frame it as a permanent change from the start.
Matching wise, we can handle the extra traffic since matching is done on the read only replica. We will also pay some CPU cost on the Network Server side (unmarshalling and RPC overhead mainly, but matching I expect to be either a hit, or 0 returned results from Redis).
See if https://github.com/TheThingsNetwork/lorawan-stack/pull/5634 fits what you had in your mind.
Regarding (2), which I'll review next week, does it include disallowing activations?
Regarding (2), which I'll review next week, does it include disallowing activations?
It disables any form of scheduling on the Network Server side - be it a join accept, class A downlink or network initiated downlink.
Edit: The original issue body now contains the shortened instructions to disable downlinks.
Summary
The v3-to-v3 migration path can contain additional debug steps until we have better support in
ttn-lw-migrate
.Why do we need this ?
In order to ease the migrations between v3 distributions.
What is already there? What do you see now?
https://www.thethingsindustries.com/docs/getting-started/migrating/migrating-from-ce-to-ch/migrate-active-session/
What is missing? What do you want to see?
For situations in which the the gateways cannot easily be migrated (i.e. they are not accessible in order to have them point to the new v3 environment), and the end devices will rejoin automatically if the Network Server does not respond to MAC commands we have a possible way to avoid data loss. We may want to document this path.
How do you propose to document this?
The idea is as follows: We would like to keep the end device session in the source v3 cluster, but this source cluster should not be able to send downlinks to the end device, and it should not be able to process join requests. Given these two prerequisites, we can process all of the traffic from the end device while also forcing it to rejoin.
This can be achieved as follows:
mac_settings.schedule_downlinks
tofalse
This can easily be automated using the CLI:
This requires The Things Stack
v3.21
(both CLI and stack).Can you do this yourself and submit a Pull Request?
Can review. cc @johanstokking - what do you think about documenting this migration path ? I think it should allow 0 data loss for end devices that do rejoin when no MAC responses are sent.