Open bartlomiej-korpus opened 2 months ago
Hey @bartlomiej-korpus this is definitely tricky in a multi-node situation. I should note that Absinthe cannot provide guarantees that messages are not dropped. I think sharding based on the topic to make race conditions unlikely is a solid plan. I don't see a straight forward way however to preserve guaranteed ordering in the async: true
scenario, but that may just be a trade off we need to document.
Environment
Expected behavior
When pushing several updates, one after another, using
Absinthe.Subscription.publish
, they should arrive in the same order they were sent.Actual behavior
They might arrive in a different order.
Relevant Schema/Middleware Code
Given a subscription like this:
and code like this:
If consecutive calls to
publish
happen very quickly, updates to entity 1 might be delivered in the wrong order.I observe this in a multi-node Phoenix setup using phoenix_redis_pubsub to relay messages between instances. The Phoenix Redis adapter guarantees the ordering of messages.
The bug originates from the fact that when routing
mutation_result
inAbsinthe.Subscription
https://github.com/absinthe-graphql/absinthe/blob/3d0823bd71c2ebb94357a5588c723e053de8c66a/lib/absinthe/subscription.ex#L220 the actual result payload is used to calculate the target shard. Payloads are different, and so are most probably shards, too. From here on, subscription proxies operate concurrently, and so ordering cannot be guaranteed.There is a second place in the code that adds to this problem: https://github.com/absinthe-graphql/absinthe/blob/84d6efafc2cbc6ef970b655cc0f1167739c3fd8d/lib/absinthe/subscription/proxy.ex#L47
here, actual channel publish happens asynchronously using
Task.Supervisor
and introduces another race condition(so even if the payloads are routed to the same shard, after firing up Tasks, we can't be sure how BEAM schedules work). However, this one can be resolved by flippingasync
flag that has been recently merged in.The only workaround I can think of for now is to use
pool_size: 1, async: false
.I'd love to help prepare a fix if this is indeed considered a bug and we can agree on a desired solution.