basis-company / nats.php

nats jetstream client for php
126 stars 30 forks source link

receive ping after publish #68

Closed ro0NL closed 6 months ago

ro0NL commented 7 months ago

Hi,

Sometimes we notice following behavior in workers:

12:23:47 DEBUG     [nats] send SUB handler.11949d0c c894bf04
12:23:47 DEBUG     [nats] send PUB $JS.API.CONSUMER.MSG.NEXT.test.test handler.11949d0c 26
{"batch":1,"no_wait":true}
12:23:49 DEBUG     [nats] receive PING
12:23:49 DEBUG     [nats] send PONG
12:23:54 DEBUG     [nats] sleep
12:23:59 DEBUG     [nats] sleep
12:24:08 DEBUG     [nats] send PING
12:24:08 DEBUG     [nats] sleep
12:24:13 DEBUG     [nats] sleep
12:24:18 DEBUG     [nats] sleep
12:24:23 DEBUG     [nats] send PING
12:24:23 DEBUG     [nats] sleep

which goes on endless.

Im trying to understand what's happening, and hoping for some clues here :)

ro0NL commented 7 months ago

will debug our replicas further

nekufa commented 7 months ago

Hi @ro0NL I've notice this can happen when consumer expiration configuration exceed client timeout. Looks like, we need to teach the client ping skip when it see that consumer is fetching a message with huge expiration timeout. I've tried to fix that in 0.18.1, if you use previous client, please upgrade and give me feedback. Hope that was clear, english is not my native language..

ro0NL commented 7 months ago

@nekufa thanks for noticing

i wasnt able to reproduce locally yet, as it does a proper reconnect and then continues

only in prod i noticed the endless ping/pong/sleep loop

we currently run v0.18.1

however i deviated from Consumer::handle(): https://github.com/etrias-nl/php-toolkit/blob/5a8d575cdf2b29af83f28887bb0bf62e3ee1dcb1/src/Messenger/Transport/NatsTransport.php#L67-L104

notable changes are 'no_wait' => true and 120 timeout

i suspected using PHP_INT_MAX for timeout created a possibility for endless looping

ro0NL commented 6 months ago

ive notived we receive a ping with all possible configs (no_wait/expires vs small/large timeout)

we need to teach the client ping skip when it see that consumer is fetching a message

i just dont understand why we get a ping even :/

ro0NL commented 6 months ago

@nekufa still issues :(

16:15:44 DEBUG     [messenger] send CONNECT
16:15:44 DEBUG     [messenger] send SUB handler.b6de490c bf07f53d
16:15:44 DEBUG     [messenger] send PUB $JS.API.CONSUMER.MSG.NEXT.test.test handler.b6de490c 26
{"batch":1,"no_wait":true}
16:15:47 DEBUG     [messenger] sleep
16:15:52 DEBUG     [messenger] sleep
16:15:57 DEBUG     [messenger] send PING
16:15:57 DEBUG     [messenger] sleep
16:16:02 DEBUG     [messenger] sleep
16:16:07 DEBUG     [messenger] sleep
16:16:12 DEBUG     [messenger] send PING

given a 10s ping interval

will continue debugging tomorrow

nekufa commented 6 months ago

@ro0NL check latest release with refactored internals, public api is unchanged

ro0NL commented 6 months ago

@nekufa we still see issues :(

14:13:40 DEBUG     [messenger] send HPUB test _INBOX.7748dd3c7af163d014b9648cd8389412 91 351
14:13:42 DEBUG     [messenger] receive PING
14:13:42 DEBUG     [messenger] send PONG
14:13:42 DEBUG     [messenger] sleep
14:13:45 DEBUG     [messenger] send PING
14:13:45 ERROR     [messenger] Socket read timeout

i just dont understand what happens

ro0NL commented 6 months ago

do we really need to reply with pong when receiving ping?