Non-persistent push state machine states conflict with RSH3

tcard commented 4 years ago

See https://github.com/ably/ably-java/issues/546 for details.

From there:

(This happens on Cocoa already.)

What happens here is that, when we get a device token, we 1. persist it and 2. emit a GotPushDeviceDetails event that provokes a transition from WaitingForPushDeviceDetails to WaitingForDeviceRegistration.

If, while at that state, the app is killed, we're back at WaitingForPushDeviceDetails, but even if we get a device token again, we have code that checks that it isn't the same token as the one persisted so that we don't emit a GotPushDeviceDetails again for the same token.

This would be correct if, once on WaitingForDeviceRegistration, we didn't "irregularly" go back to WaitingForPushDeviceDetails, but we do, because WaitingForDeviceRegistration is not persistent.

┆Issue is synchronized with this Jira Bug by Unito

maratal commented 3 years ago

Correcting invalid state during initialisation seems pretty legit to me, WDYT @QuintinWillison @ben-xD @lukasz-szyszkowski Shouldn't #967 close this issue?

ben-xD commented 3 years ago

Correcting invalid state during initialisation seems pretty legit to me, WDYT @QuintinWillison @ben-xD @lukasz-szyszkowski Shouldn't #967 close this issue?

The comment in https://github.com/ably/ably-cocoa/pull/967 says

This also serves as a workaround, so that the "proper" fix for #966 is not urgent.

I don't know if additional work has been done, but just seeing https://github.com/ably/ably-cocoa/pull/967 and https://github.com/ably/ably-cocoa/issues/966, I don't think we can say its finished?

QuintinWillison commented 3 years ago

The opening comment on this issue doesn't describe the problem well enough for me, nor does it suggest a fix. I'm finding this issue very difficult to parse and understand.

lukasz-szyszkowski commented 3 years ago

same here, it needs deeper diving into code - can't tell you if #967 should close it since it's marked as a workaround

jamienewcomb commented 2 years ago

@maratal can you make a decision if we can close this issue please

maratal commented 2 years ago

To better understand what Tony was trying to say you should replace event names to something more obvious:

So, his

What happens here is that, when we get a device token, we 1. persist it and 2. emit a GotPushDeviceDetails event that provokes a transition from WaitingForPushDeviceDetails to WaitingForDeviceRegistration.

If, while at that state, the app is killed, we're back at WaitingForPushDeviceDetails, but even if we get a device token again, we have code that checks that it isn't the same token as the one persisted so that we don't emit a GotPushDeviceDetails again for the same token.

This would be correct if, once on WaitingForDeviceRegistration, we didn't "irregularly" go back to WaitingForPushDeviceDetails, but we do, because WaitingForDeviceRegistration is not persistent.

Turns into

What happens here is that, when we get a device token, we 1. persist it and 2. emit a GotAPNSToken event that provokes a transition from WaitingForAPNSToken to WaitingForAblyRegistration.

If, while at that state, the app is killed, we're back at WaitingForAPNSToken, but even if we get a device token again, we have code that checks that it isn't the same token as the one persisted so that we don't emit a GotAPNSToken again for the same token.

This would be correct if, once on WaitingForAblyRegistration, we didn't "irregularly" go back to WaitingForAPNSToken, but we do, because WaitingForAblyRegistration is not persistent.

Probably worth it to replace this in the actual code too.

Anyways, I disagree with this issue because:

RSH3 states that "While this process should be implemented in whatever way better fits the concrete platform..." and "...this typically forces some kind of on-disk storage..." which is a very broad way to describe something that you can be in conflict with. Specs don't specify which states should persist and which not.
The solution proposed by Tony here

I think the cleanest solution here is to persist a representation of the awaited request as part of the "waiting for request completion" persistent states, and, on re-awaking the state machine on such states, make the associated requests again.

requires to implement requests storage mechanism similar to one in Firebase. Which is completely different client-server programming approach and should be implemented across the whole library or not implemented at all.

The code Tony refers to in "we have code that checks that it isn't the same token as the one persisted" is discussed here (see red arrow) and in any case is a defensive check against request duplication, not only for the case of an app re-launch after unexpected exit during request. And this check doesn't even work in some cases as discussed in slack from the link above, leading to creation of the second state machine. And this situation was probably erroneously mentioned by Tony in a commit message as related to this issue, because this issue mentions said check (red arrow), which is needed in any case. And without proper storage access queue there are always will be a risk of state machine duplication.

So, I vote to close this issue due to lack of strong points to keep it open. @jamienewcomb @QuintinWillison @lukasz-szyszkowski @lawrence-forooghian @paddybyers

lawrence-forooghian commented 2 years ago

Addressing your points, @maratal:

which is a very broad way to describe something that you can be in conflict with. Specs don't specify which states should persist and which not.

I agree that the specs are possibly vaguer here than they need to be (I'm not sure of the motivation behind the wording). But, for me, the important takeaway from RSH3 is "it should be taken into account that its lifetime is that of the app that runs it". That is to say, the state that an app exits in is the state that it should be in next time it's launched.

I think that to ignore this rule complicates the process of developing the push notification state machine. Now a client library developer has to not only understand the state machine described by the spec (which is complicated enough as it is), they also have to understand that there may exist platform-specific behaviour (i.e. undocumented state transitions). This reduces the effectiveness of having a shared features spec.
requires to implement requests storage mechanism similar to one in Firebase

What was the "storage mechanism" you had in mind? I'm not familiar with Firebase.

I wonder, though, whether any kind of storage mechanism is even required. Possibly the request can always be derived from the current push machine state (e.g. if the app starts in ARTPushActivationStateWaitingForDeviceRegistration state then you just need to POST to /push/deviceRegistrations with the device’s persisted details, etc.)
I'm afraid I don't understand how the issue of having multiple state machines is related to this issue. Is it a key point of your above argument? If so, please could you help me understand?

maratal commented 2 years ago

What was the "storage mechanism" you had in mind?

In Firebase you make some changes and it syncs it with the server whenever possible. You are not required to have a live connection to the server to post any changes there.

issue of having multiple state machines is related to this issue.

Tony referred to this issue as a possible cause of multiple state machines, which is not true, because duplication occurs due to the lack of synchronized access to the storage.

lawrence-forooghian commented 2 years ago

Tony referred to this issue as a possible cause of multiple state machines, which is not true, because duplication occurs due to the lack of synchronized access to the storage.

Ah, OK.

So, in case I wasn't clear in my previous message, I think this is an issue that we should fix. Keen to hear other opinions.

lawrence-forooghian commented 2 years ago

In Firebase you make some changes and it syncs it with the server whenever possible. You are not required to have a live connection to the server to post any changes there.

That sounds like much more than I think we'd need in order to address this issue.

lukasz-szyszkowski commented 2 years ago

I agree that the specs are possibly vaguer here than they need to be (I'm not sure of the motivation behind the wording). But, for me, the important takeaway from RSH3 is "it should be taken into account that its lifetime is that of the app that runs it". That is to say, the state that an app exits in is the state that it should be in next time it's launched.

I think that to ignore this rule complicates the process of developing the push notification state machine. Now a client library developer has to not only understand the state machine described by the spec (which is complicated enough as it is), they also have to understand that there may exist platform-specific behaviour (i.e. undocumented state transitions). This reduces the effectiveness of having a shared features spec

100% agree

If we don't want to change State Machine as @lawrence-forooghian proposed, we could probably persist token after this event WaitingForAblyRegistration. Is it makes sense?

maratal commented 2 years ago

Is it makes sense?

Sure, I've even made a PR with this change https://github.com/ably/ably-cocoa/pull/1179 But it has broken tests so horrendously, that I stepped back. I'll give it another shot, my understanding of this state machine is a bit better since then, plus we have way more improved tests now to deal with (kudos to @lawrence-forooghian 💪).

ably / ably-cocoa

Non-persistent push state machine states conflict with RSH3 #966