Centrifugo v2 - Githubissues

centrifugal / centrifugo

Scalable real-time messaging server in a language-agnostic way. Self-hosted alternative to Pubnub, Pusher, Ably. Set up once and forever.

https://centrifugal.dev

Apache License 2.0

8.38k stars 594 forks source link

Centrifugo v2 #221

Closed FZambia closed 6 years ago

FZambia commented 6 years ago

Hello dear Centrifugers.

The work on Centrifugo 2 started in the end of 2017 and it's now almost done. It will serve the same purpose as Centrifugo v1 but won't be backwards compatible – migration to it will require adapting both backend and frontend sides of your application (of course if you decide to migrate). Changes are not too difficult. I will try to write more information later. For now you can look at post describing some of v2 aspects and reasons lead to some decisions. It's not fully actual at moment but the main ideas are the same.

Several highlights of v2:

Cleaner and more structured protocol defined in protobuf schema
binary Websocket support (Protobuf). Of course JSON still there
JWT for authentication instead of hand-crafted HMAC sign
~~GRPC client transport (not for browser)~~ (see below)
Prometheus integration and automatic export of stats to Graphite
Refactored Javascript (ES6), Go and gomobile client libraries.
Simplified API auth (got rid of request body signing)
GRPC for server API
Structured logging
Mechanism to merge several Websocket messages into one
Better recovery algorithm to fix several recovered flag false positives
Goreleaser for automatic releases to Github (previously I have to upload everything manually)
Based on new library centrifuge for Go language

Some things were removed from Centrifugo in v2 release:

publishing over Redis queue
admin websocket endpoint
client limited channels
websocket prepared message support

Some things you can help with as it's really hard to do everything myself:

improve web interface as it's currently uses very old Javascript libraries and gulp-based skeleton
become a maintainer for centrifuge-ios, centrifuge-android
help with docs
help with server API libraries for NodeJS, PHP, Ruby
help with updating examples repo
create new examples based on Centrifugo v2 or centrifuge library.

All these tasks require you already familiar with Centrifugo or want to dive deeper as you need to understand how things work internally.

During next days I am planning to work on docs - most of them must be written from scratch so I don't know how much time it will take. Docs prototype located here. Centrifugo v2 itself is in c2 branch.

At moment I am looking forward for developers who are using Centrifugo and want to review Centrifugo v2 at its alpha and beta state. If you ever wanted something backwards incompatible to be added into Centrifugo core - this is the right moment to say. Please contact me here, over email or Gitter.

arrowcircle commented 6 years ago

Hey! Great news! Where I can find the protocol changes to make ruby lib compatible?

FZambia commented 6 years ago

@arrowcircle hi!

Just wrote a chapter in new docs about API. In short - it's just a POST request with JSON body to /api endpoint and optional API key set via Authorization header. No signing needed anymore. This commit into Python cent library adapts client to be used with new Centrifugo - it can help to understand which changes needed. Also note that token renamed to sign and timestamp renamed to exp and changed semantics (it's now timestamp seconds of connection expiration instead of current timestamp seconds). So helper functions will change a bit too.

I think most of the things are pretty final though after some feedback still can change a bit.

FZambia commented 6 years ago

So just to give some info about v2 status - at moment I am trying to solve two questions:

1) Does GRPC client transport based on bidirectional streaming has benefits over Websocket for Centrifugo use cases - my first measurements showed that Websocket is better in all aspects (server CPU, server memory, traffic) for our use cases. There is possibility that GRPC client transport won't be included into release from start and chance that it won't be used at all.

2) I want to find a better algorithm for message recovery after disconnect. Particularly for the case when there were no active messages in history cache and client reconnects. For this case Centrifugo can't say exactly after reconnect were all messages recovered or not (recovered flag is false in subscription response). The idea is understand that all messages were recovered if disconnect time was no bigger than history_lifetime and no more than history_size messages appeared in cache .

FZambia commented 6 years ago

I removed GRPC bidirectional streaming client transport because:

GRPC requires more memory on server (4x compared to Websocket)
GRPC generates more traffic via interface than Websocket with protobuf (~20-30% more for Centrifuge protocol)
GRPC is much more CPU hungry on server side (2x-3x)

It's still possible to put it back in future if we find its advantages in some scenarios. Note that GRPC for server API is still here.

Also improved message recovery - new docs here https://centrifugal.github.io/centrifugo/server/recover/

masterada commented 6 years ago

Hello,

Great work. After browsing through the code I have some thoughts (I have never used centrifugo before, I'm just know checking the project to see if it fits my use case).

I see that the Engine is no longer pluggable.

Is there a reason for unexporting the engine methods?
Is there a reason for removing the plugin.go? I understand it doesn't make much sense in the library code, but I think it could still be included in the server project. Registering a new plugin and using it from config is much cleaner in my point of view than forking the server project to change 1 line for using a different engine.
I also realized that the engine interface consists of 3 parts: PUB/SUB mechanics, channel history and presence information. I think it could be separated into 3 different interfaces, with minimal effort (histroy saving would need to be moved to a dedicated addHistory method, which could be called from Node.publish method). RedisEngine could still implement all 3 interfaces, but pub/sub, history and presence handling could be swapped out independently.

My use case:

I need to write a special presence handling. In v1 I could write a custom engine that has an embedded RedisEngine struct with the presence methods overriden. Now with the Engine interface methods being unexported it's no longer possible.

An another note:

It would be really nice to have a mute client in channel feature in the server API, resulting in that client not getting the messages.

My use case:

Free clients join a channel. One of them starts paying. This one client will receive slightly different notifications on that channel. I know it's possible to leave that channel and join another for the paying session, but it would need to be initiated on frontend, and it would be really nice to be able to solve this on backend only. An alternate solution could be to be able to add "except_clients" id list to publish/broadcast messages.

FZambia commented 6 years ago

Hello @masterada ! Thanks for a great feedback!

Is there a reason for unexporting the engine methods?

Yes, the reason here is that I don't know any other Engine implementations and their requirements so decided to approach this with caution. I.e. my final goal is to make engine interface fully exported and pluggable in Centrifuge lib - but I don't want to export things right now to not break public API later. So if someone interested in having engine exported we can find a proper way and moment to export it. Also see below.

Is there a reason for removing the plugin.go?

As Centrifugo will now use Centrifuge lib it's not that difficult to plug whatever developer wants. From my point this makes library much more manageable and easier to maintain. Regarding Centrifugo server implementation: adding new plugin using code from plugin.go anyway required rebuilding binary by developer itself. So I think there is no much difference in possibilities but the code is much cleaner now. Also in version 2 I tend to remove some parts that seem hacky to me and not globally useful - this is one of them.

I also realized that the engine interface consists of 3 parts...

I also noticed this and I actually have secret gist regarding to this. The problem with 3 parts is that it generally looks cleaner and more flexible but not justified by reality where we only have 2 main Engines where everything done in memory or in Redis and this separation can be a bit overkill.

If you look to my gist you will see that PUB/SUB mechanics combined with channel history in Broker interface. That's because from performance and atomicity perspective it's a great win to save message into history in publish method of PUB/SUB broker - in case of Redis it allows to do this in one RTT to Redis (via lua script). I suppose there is some way to separate engine in parts but still keep this property - but I just had no time and use case to investigate this more to find correct and elegant component design than I already did. But personally I am for this separation - but it's just not that simple.

Btw, this topic about correct engine separation is one of the reasons I don't want to export Engine interface right now.

It would be really nice to have a mute client in channel feature in the server API, resulting in that client not getting the messages. Free clients join a channel. One of them starts paying. This one client will receive slightly different notifications on that channel.

Could you elaborate more about this - why not using 2 different channels for this?

Actually I thought many times about having server-side Subscribe() method in Centrifuge library (not in Centrifugo for now while there are no hooks to communicate with backend) so backend could subscribe client to channels itself. But I have not found an elegant way yet how to integrate this to protocol and existing client libraries. I see that you have figured out Centrifuge/Centrifugo internals pretty well - so maybe you will have some ideas on this.

FZambia commented 6 years ago

I'll try to elaborate more on my points above as some of my thoughts were pretty chaotic.

As far as I understand you are suggesting to do sth like this:

type Engine interface{
    Broker
    HistoryKeeper
    PresenceKeeper
}

Both Memory and Engine will implement all methods of Engine interface thus will work. And if someone want to switch component it will be possible to call sth like node.SetBroker(BrokerImplementation) and control on PUB/SUB mechanics will be passed to this component.

In Node publish we can call:

node.historyKeeper.addHistory(...)
node.broker.Publish(...)

Instead of

node.engine.Publish(...)

If you look at Publish method of Redis engine you will see that it publishes to channel and saves history in one RTT to Redis. This is a property I want to keep for Redis engine. First idea is making addHistory noop in Redis Engine but this means that Redis Engine can't be used as one of history keepers if we swap PUB/SUB broker to sth else. The solution - make it configurable - noop addHistory in one case and addHistory which saves history in another case. This is not very beautiful.

Regarding to muting and except_clients - your case can be solved subscribing on two different channels - on both even if client have not start paying - you just don't publish new messages into that channel until right moment to start doing this. Maybe there is problem that I just don't see.

Regarding to server-side subscribe. It's possible to subscribe on server-side but client will not have callback handler set to process messages coming from channel. Also there is a question about message recovery - can't imagine how to fit it into this model - looks like this must be a task for application code in this case.

masterada commented 6 years ago

Thanks for the detailed explanation. I completely understand you reasons for not wanting to sacrifice performance for a feature that's might not even needed (separating broker from history).

About subscribing clients on server side

Let's assume a js client subscribed to a public:news channel. He is handling "news" type messages. It might not even make sense to subscribe him to a public:groceries channel, because the client would need to handle these new types of messages, so we might as well just instruct the client to subscribe to public:groceries channels by itself.

On the other hand, it might make sense to subscribe the client to a gossips:news channel. It has the same kind of messages, the client already handles them, the only difference is that now the client will get more messages of the same kind. However it would still be confusing, the client subscribed to public:news, and suddenly it starts getting messages from gossips:news. I don't see a good, non-confusing way of implementing this, so let me suggest a different approach.

Message tagging

During publishing a message, there is an optional tag parameter. Each subscription (user-channel combination) has it's own tags. When forwarding a message to a user, only forward it if the subscription has a tag matching the message's tag. Configuration could contain default tags (for namespaces) that are automatically added to each new subscriptions. Tags could be managed on either client side or server side (with the option to disable client side tag management).

So instead of:

client subscribes to public:news
if client has access to gossips, he subscribes to gossips:news as well (with proper authentication)

You could do:

client subscribes to news (and the subscription gets the default public tag)
if client wants to read gossips as well, client tags the subscription with gossips

Or:

client subscribes to news (and the subscription gets the default public tag)
backend tags the subscription with gossips tag (calling something like /tag?user=<USER ID>&channel=news)

It would solve my use case as well:

client connects to channel, gets notifications (and the subscription gets the default free tag)
backend removes the free tag and adds the paying tag

It's not the same as subscribing the client from backend, but I think this could be easier to use than using multiple subscription from the client to get public and access restricted messages of the same type.

Regarding your case can be solved subscribing on two different channels...: it does not work if there can be more than 1 clients on the same paying channel (it's doable with user restricted channels but a bit more complecated). Also it's problematic for me to get the free messages as well while the user is in paying status (can be solved by filtering on frontend, but again, more complex).

Of course all this is just a suggestion. If you like it, I can help with the implementation. If you don't, it will still be a great project :D

FZambia commented 6 years ago

Possible solutions

Still not sure I understand your difficulties right.

1) You can have 2 channels - one for free events and one for paid events. As soon as user starts paying it subscribes on paid channel stream and receives both free and paid events from 2 streams. And on client side you have the same publication handler for events from both channel subscriptions.

2) Another option you mentioned in your first post - resubscribing on paid channel as soon as user starts paying. In this case on backend you publish free events to both channels (free channel and paid channel) and paid events only to paid channel. So you have 2 separate streams - one for free users and one for paid users.

Summary 😀

It's pretty hard to discuss this on Github, because I have feeling that I still don't understand your use case right and suggesting unviable solutions:) What is your thoughts on my points here? If you feel that I don't understand you right then maybe we could discuss this in chat on Gitter.

masterada commented 6 years ago

You are right, I didn't think about the issue of persisting tag information.

I will try to clarify my circumstances:

We are developing a platform, and want to keep the usage of this platform as simple as possible. That means as simple frontend code as possible.
Change between paid and free status is not always initiated by client. It might come from a backend event (eg user runs out of money).

The 2. means the only viable workflow is the following:

client subscribes to a user restricted operation channel (so we can notify it to join/leave paid/free channels)
user joins free channel
when backend event needs to trigger subscription change, it send a message on operation channel (either sending the sign here that the client can use to join the paid channel, or instructing the client to request a sign and then join the channel)

(This is very similar to your suggestions. )

In order to keep this simple, everything more than a one time subscription to 1 or more channels is unexceptable.

I can of course solve this issue by providing my own library that wraps the centrifuge js client and adds the above mentioned functionality, hiding these details from the users of our platform (and by users i mean frontend developers).

So to summarize:

If backend side subscription works, but it requires extra effort from frontend developers, it's a no-go for me.
If tags work, but it requires extra effort from frontend developers, it's a no-go for me.
Temporary unsubscription (aka muting) could work (if there is no client side logic associated with it), but I now see it has the same issue as tags (the need to persist muted state)

With tags the following workflow could be implemented in the centrifuge server and client:

backend sets tag via the server api (it's not an add/delete, backend must specify the new tags exactly)
server notifies client to request a tag change, providing it with a sign to do so (a sign thats based on all newly active tags)
client library updates tags seemlesly
client library saves the new tag sign for the channel, so it can use it for reconnect

Could even hide tags feature from client library, by sending client an updated subscription data + sign (with tag info that the client doesn't need to know about), which it used to upgrade it's existing subscription (and for later reconnect if needed).

But this complex client side logic might not be worth the feature. It's always a tough call to draw the line between features and simplicity :)

One more thing that popped into my mind during writing this: have you considered using JWT? It's a standardized solution that encapsulates data and it's signature, basically used for the same purpose as you use the signs for.

FZambia commented 6 years ago

A quick question - from your post I did not understand - is one subscription to private channel is acceptable for you? I. e. 1 Subscription to channel that needs requesting backend for sign and possibly tags (the request to backend will be sent every time user subscribes) ?

masterada commented 6 years ago

Yes, if it's a one time thing (eg: during site load on traditional websites, or opening a page in a single page application). What's not ok is handling the free/paid status change in the 3rd party code in any way. In other words: if the developer who uses our platform needs to write any code that reacts to the paid/free status change, that's not ok.

I want to completly hide the fact from the platform's users that there is even a free/paid status. I want them to subscribe to one channel and keep processing the messages without caring about whether they are free or paid. If in the backend it's solved by 2 channels I don't care, I just want to hide this implementation detail completly.

FZambia commented 6 years ago

Yep, thanks! I considered JWT before - but it seemed hard to support it across languages. Actually Centrifugo was born before JWT gained its popularity. Now looks like there are tons of libs implementing RFC spec, so this looks reasonable. Though still needs a bit investigation as all libs has its own API to generate tokens - hopefully resulting string is spec compliant and Go server can verify and decode it despite of language that was used to generate it :)

It seems also that using JWT will allow to simplify integration with Centrifugo where we don't have helper libraries and be more flexible when we want to add features to Centrifugo-specific data (like tags from this discussion) - because at moment we have to add this to all helper libraries.

Back to tags. Adding more stuff to protocol like updating subscription state seems a very complex solution. It's possible to implement but you are right that it makes things more difficult and hard to debug. Sure there could be a better way. Some ideas:

use disconnect API command to disconnect user. In this case client will automatically reconnect and thus will have a chance to get actual tags from backend during private channel subscription process. Downside is that it will reconnect with delay but I think it's possible to add new fields to disconnect command like reconnect_delay: true, reconnect_after: 0 to control disconnect behaviour.
use unsubscribe API command with a new field that will tell client that it must unsubscribe and then subscribe again (smth like resubscribe: true): so will get actual tags from backend during private channel subscription process.

Both approaches never guarantee delivery (as Centrifugo is at most once delivery transport) but should work in practice in normal circumstances. And actually your suggested approach updating subscription state has the same guarantees.

Does this make sense for you?

FZambia commented 6 years ago

BTW this all can be paired with connection check mechanism to ensure valid client state.

Update: no, this is wrong as connection check does not operate with subscriptions.

FZambia commented 6 years ago

I investigated JWT a bit - looks like it suits pretty well. Generated token in Python:

jwt.encode({"user": "42", "exp": 121010101010, "tags": ["a", "b"]}, key="secret")

Then decoded in Go:

package main

import (
    "fmt"

    "github.com/dgrijalva/jwt-go"
)

type ConnClaims struct {
    User string   `json:"user"`
    Info string   `json:"info"`
    Tags []string `json:"tags"`
    jwt.StandardClaims
}

func main() {
    s := "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoiNDIiLCJleHAiOjEyMTAxMDEwMTAxMCwidGFncyI6WyJhIiwiYiJdfQ.fUoNhGoYgXwJd9D9K_hloFo0MkwUgQyIrDQJDN0Akp8"

    token, err := jwt.ParseWithClaims(s, &ConnClaims{}, func(token *jwt.Token) (interface{}, error) {
        return []byte("secret"), nil
    })

    if claims, ok := token.Claims.(*ConnClaims); ok && token.Valid {
        fmt.Printf("%v %v %v", claims.User, claims.Tags, claims.StandardClaims.ExpiresAt)
    } else {
        fmt.Println(err)
    }
}

An interesting idea here is adding tags to user connection itself instead of subscription. This will allow to set tags on connect and filter publications based on user tags and not on subscription tags. This is less flexible in general but will allow to not use private subscriptions. The only problem here is updating tags on the fly. This is easy to do during connection check request. But to change tags immediately after they changed on backend some sort of signal required - maybe new API refresh command that will force active user connections to refresh token from backend thus updating tags. Maybe sth else? In this case looks like it can be paired with connection check to ensure valid tag state.

masterada commented 6 years ago

I read the centrifuge js lib code, and had the exact same tought - using refresh to update the tags, and an option to force client refresh from server api. I don't think it's an issue to have user scoped tags instead of subscription tags - it's still possible to prefix the tag name with the channel name if needed. It might be a good idea though to make guest tags configurable (a static list of tags that apply to guests).

If you are looking at jwt, I suggest you check out go-jose instead. It implements all of jws, jwe and jwt (go-jwt only implements jws + jwt), even thought you will probably not need the jwe part. I also found it a bit easier to use. Here is an example of usage (parse+validate).

We use jwt in php, go and nodejs, so far the only difficulty we ran into is that some libraries accept the key in base64 format (eg: php), while others use it as-is (eg: go). It caused us some headache :)

FZambia commented 6 years ago

@masterada I've created pull request to https://github.com/centrifugal/centrifuge (#6) with JWT support.

I had time to think more about tags idea while adding JWT. In general I still like what tags can provide in terms of channel configuration. But our final implementation ideas here not very robust unfortunately.

Imagine situation where tags set via user connection token. Then at some moment tags change. If user is offline at this moment he won't get updated tags and will reconnect with the same token after going online (if token not expired). Not asking for token on reconnect is important in terms of not ddosing application backend with CPU intensive tasks (for example when Centrifugo node restarts). This means that user will have old tags until next token refresh. Maybe we should just provide an option to refresh token on every client reconnect.

From this perspective having tags information in private subscription token is more robust as private subscription token is asked every time client resubscribes. This means that on Centrifugo node restart there will be lots of private subscription requests after every client reconnect. But this is a reasonable compromise that we already had before, people use this and not everyone actually using private subscriptions. But to update tags on the fly some sort of signal required (disconnect/resubscribe maybe) and looks like subscription token refreshing on expiration is also a good idea. But this requires quite a lot work - not sure I can spend time for this at moment. But seems like it's possible to add at any moment later.

So I am not sure about best way to add this feature yet.

FZambia commented 6 years ago

If you are looking at jwt, I suggest you check out go-jose instead. It implements all of jws, jwe and jwt (go-jwt only implements jws + jwt), even thought you will probably not need the jwe part. I also found it a bit easier to use. Here is an example of usage (parse+validate).

In Centrifuge case we have to handle token expiration in special way to support refresh workflow. I looked at go-jose and have not found a straighforward way to check that the only problem with token is that it's expired.

masterada commented 6 years ago

I see your point about tags and refresh. Still, tags could be a private channel only feature.

I decided I will use the centrifuge library in a new project, because I will need to change private channel subscriptions very often, and I think the short refresh interval I would need to do this with token style is more of a performance overhead than using backend webhooks from centrifuge server for authentication. I will try to include the minimal code to be able to support tags with the library, and create a pull request. But before that I need to dig in some more :)

FZambia commented 6 years ago

@masterada ok, feel free to ask any questions on Gitter and via personal messages if you prefer. As you can see I was able to implement subscription expiration - implementation is not ideal but I think it's pretty sufficient for this moment.

FZambia commented 6 years ago

Centrifugo v2.0.0-alpha.2 just released - this is first public pre-release, hope someone will give it a try and share feedback.

FZambia commented 6 years ago

Centrifugo v2.0.0-beta.1 released

FZambia commented 6 years ago

So Centrifugo v2 released - release notes are here. Thanks a lot to everyone who helped during development: @masterada @mogol @Inpassor @furdarius @wlredeye and others.

There are still lots of things to do in transition to v2 - update remaining libraries, several examples still use v1, fixing bugs (sure there are some). But the important step just made:)