eclipse / paho.golang

Go libraries
Other
330 stars 92 forks source link

V0.20 (session persistence) Release Discussion #207

Closed MattBrittan closed 7 months ago

MattBrittan commented 9 months ago

Creating this issue to provide a place to discuss the next release (including session persistence). This release will contain major changes (a few of which are likely to break existing users code - permitted as we are still pre 1.0).

Due to the extent of the changes (particularly session persistence) we are keen to receive feedback from anyone using @master (the more people we know are using this successfully, the more confidence we will have that it's ready for release).

Issue creation prompted by an email to the paho-dev mailing list (an issue here seems the best place to discuss this).

MattBrittan commented 9 months ago

I'm now running @master in production with:

All connections run QOS1 and the remote clients utilise autopaho PublishViaQueue (with disk based persistence) and disk based session state.

To date the only real issue I've encountered was due to a corrupted queue (addressed in #194 and #199 ). This will have been caused by the power being pulled (was a component in a larger system being built in a customers factory, so regular power outages are to be expected); upcoming change to add Sync call in the queue/store may have helped prevent this too.

Apart from the one issue, I'd say that things are running better than the v3 client (got out-of-order packets fairly frequently with that following loss of connection, have not had the same issue with the V5 client). My app does perform checks for missing and out of order messages, and I've not seen any alerts, so fairly happy that the session management piece is working OK. The only downtime I've suffered was due to the corrupted queue issue (as this was a test site I was able to leave it as-is until the fix was available so have confirmation that the fix works).

vishnureddy17 commented 9 months ago

Thanks for all the work you've put in to get to this point, it's really appreciated!

Since the next release is going to break existing user code already, here are some additional breaking changes I think should be considered before the next release.

I can take a stab at one or both of these and submit a PR if that'd be useful.

MattBrittan commented 8 months ago

My thoughts on to-do's for the next release:

I'm ambivalent on the logging change (this is primarily for library development and is not something most users will need to touch). Anyway looks like a release is not far off; would like to see this happen in Jan.

MattBrittan commented 8 months ago
  • Fix the PingHandler (I believe it still does not function properly, see PingHandler will not time time out #137). While that fix is being done, revisit the Pinger interface to add a method that gets called to reset the timeout any time a packet is sent or received.
    I can take a stab at one or both of these and submit a PR if that'd be useful.

@vishnureddy17 would love to see a PR for this.

vishnureddy17 commented 8 months ago

@MattBrittan Sounds good. I'd be happy to do the pinger PR, and I can get that out next weekend (Jan 6-7).

Another change I think is important is to allow users to get back the ack on an async publish. One way to achieve this is to change AddToSession() in SessionManager to return a function that will block until the ack is received. The paho client would just return that function to the user. The function would be valid for use for as long as the session is.

This is quite important for my use-case. I can work on a PR for that and make an effort to get that out in January.

The autopaho manual ack issue (https://github.com/eclipse/paho.golang/issues/160) is something to look at as well, but I think it is less important since users can implement their own managed client if autopaho doesn't meet their needs.

MattBrittan commented 8 months ago

Another change I think is important is to allow users to get back the ack on an async publish.

My thought on this was to add callbacks to PublishOptions; we could then store the PublishOptions in state.clientPackets (making it easy to call when an associated event occurs). I prefer the callback approach because I think it's easier for the end user (and avoids the need to have a lot of go-routines hanging around waiting for acknowledgements). One note of caution is that the session may outlast the client instance (and I can't think of a good way of handling that!).

The autopaho manual ack issue (https://github.com/eclipse/paho.golang/issues/160) is something to look at as well

Yep - I would like to see a solution for this but struggle with the logic around reconnections (I'm thinking more of the logic that would be required in users code than the logic in the client). It would be good to see use-cases from someone who needs this functionality.

vishnureddy17 commented 8 months ago

One note of caution is that the session may outlast the client instance (and I can't think of a good way of handling that!).

I think that's expected and necessary. As a user, I'd want to know when the publish was handled in the session (regardless of how many reconnections happened in the interim). In the solution you suggest, that would mean that the callback can potentially be called long after a particular client is dead.

I see how that could be confusing to the user, and that behavior would need to be documented.

romansitina commented 8 months ago

Hello all, appreciate your work! Would love to see PingHandler fix in the next release - we recently found out that this is causing issues in our deployment, resulting in more than 5 minutes lags when connection problem is unnoticed.

MattBrittan commented 8 months ago

Update on the to-do list:

Done:

In progress:

Backlog / Undecided:

My thought is that once we get the pinger sorted I'll run this on a few production workloads for a week or so before cutting a release. After that release it would be good to pull a "Before V1" list because I think we are fairly close.

vishnureddy17 commented 8 months ago

Allow users to get back the ack on an async publish (Nice to have but can wait as it should not be a breaking change)

It wouldn't break users who are using paho defaults. It would be a breaking change to the SessionManager interface though, right?

Despite this, I think it's ok to defer this change to a future release as long as it's before V1. I don't anticipate that changes to SessionManager would be breaking to that many users. Just asking to clarify.

MattBrittan commented 8 months ago

It would be a breaking change to the SessionManager interface though, right?

You are right (however realistically I don't see anyone else implementing this interface for some time :-) ).

With the introduction of the new Pinger I think we have everything needed for the release. I'll push the current @master out to a few non-critical systems and leave it running for a week as a final test (happy to see further changes but will probably delay anything major).

Update 20th Jan: Further changes (pinger and clean shutdown process ref issue #227) will delay this slightly; I am running the current master (ef0065f) on a few machines in production without issue. Really appreciate any other feedback from testing (every use-case differs so I will not catch all bugs!).

MattBrittan commented 7 months ago

v0.20 released so I'll close this issue. Thanks very much to everyone who contributed!