centrifugal / centrifugo

Scalable real-time messaging server in a language-agnostic way. Self-hosted alternative to Pubnub, Pusher, Ably. Set up once and forever.
https://centrifugal.dev
Apache License 2.0
8.45k stars 598 forks source link

[bug] The ID specified in XADD is equal or smaller than the target stream top item #768

Closed oleg-smith closed 8 months ago

oleg-smith commented 10 months ago

Describe the bug. A clear and concise description of what the bug is.

Seeing this error while publishing using REST API.

The error does not appear all the time.

{
  "level": "error",
  "channel": "user_notifications_8835df4f-be05-4a38-896f-6ce29e89f44c",
  "error": "Error running script (call to f_33ea4857923b0bb12fdf1417bdfd7b5eab7c177f): @user_script:23: ERR The ID specified in XADD is equal or smaller than the target stream top item",
  "time": "2024-01-29T15:47:07Z",
  "message": "error publishing message in engine"
}

Centrifugo config:

{
  "admin": true,
  "allow_history_for_client": true,
  "allow_presence_for_client": true,
  "allow_presence_for_subscriber": true,
  "allow_publish_for_client": true,
  "allow_subscribe_for_client": true,
  "allow_user_limited_channels": true,
  "engine": "redis",
  "history_size": 1000,
  "history_ttl": "2160h",
  "log_level": "trace",
  "namespaces": [],
  "presence": true,
  "proxy_publish": true,
  "proxy_publish_timeout": "5s",
  "proxy_rpc_timeout": "5s",
  "publish": true
}

Using a single node Redis cluster

Versions

Centrifugo version is 5.1.1 Client library used is centrifuge-<???> of version <???> Operating system is <???>

Steps to Reproduce How can the bug be triggered?

publish a message using REST API with a API token

Expected behavior What output or behaviour were you expecting instead?

...

Code Snippets A minimum viable code snippet can be useful.

Your code here.
FZambia commented 10 months ago

As discussed in Centrifugo Telegram group: the root cause here was that history_ttl is 2160h (90 days). It's greater than default history_meta_ttl value used by Centrifugo (which is 720h i.e. 30 days). History Meta TTL must be greater than history TTL to avoid errors like this.

The documentation contained old description of defaults used in Centrifugo v4 (previously history meta ttl was 90 days by default). Fixed this and added more clarifications on a proper configuration in https://github.com/centrifugal/centrifugal.dev/commit/51fb304f7e4af406e5bc03c593ffb987aaf82b27

Need to think more whether it's possible to avoid such situations - we can at least validate ttl values on start.

VanderY commented 9 months ago

Hello, @FZambia!

I was getting same error. Did what you suggested - increased history_meta_ttl to be greater than history_ttl but still getting this error.

Maybe you can give me some advice on what to do with this?

FZambia commented 9 months ago

@VanderY hello, I think you are getting errors for already existing streams. Fixing configuration won't help with them automatically, only help with newly created streams. You are using Redis engine right? If yes - then theoretically flushing state in Redis may help to get rid of such errors. If you are using auto recovery and using Centrifugo right on client side – then flushing Redis state should not cause a serious issue for clients as the state must be loaded from your main database since clients will receive insufficient state advices. But I suggest testing this scenario and decide whether it's acceptable. Also, make sure Redis is only used for Centrifugo, otherwise you may need to clean up only Centrifugo keys.

If you need to keep information in Redis then the only way here is manual migration of keys. But using Centrifugo as persistent storage is not a recommended pattern - so I believe it's not your use case.

FZambia commented 8 months ago

Found one more related bug which could result into similar issues with Memory Engine – fixed in https://github.com/centrifugal/centrifuge/pull/366

FZambia commented 8 months ago

Config validation and fix above are now part of v5.3.0. I also created an issue in the underlying Centrifuge library to return an error during publish, this is not very relevant to Centrifugo though.