Long Polling only happening every 25 seconds?

smtlaissezfaire commented 6 years ago

I'm seeing long polling events getting chunked and the js client only receiving all of the messages on the channel, but only every 25 seconds.

If, on the other hand, I run something like this in a console:

MessageBus.subscribe('/my_channel') do |msg|
  puts msg
end

I see the results immediately.

I couldn't find any config variables that were set with 25 seconds. Any idea why this would be the case?

I tried setting

MessageBus.enableLongPolling = false;
MessageBus.callbackInterval = 1500;

in which I start to get messages "immediately" (every second and half).

Also, I tried setting MessageBus.alwaysLongPoll = true; and MessageBus.enableChunkedEncoding = false; without any change in behavior.

I can confirm that this happens in both safari and chrome on OS X (haven't tried FF yet).

I'm integrating into Sinatra, calling use MessageBus::Rack::Middleware in my config.ru and using thin v1.7.2. Using message_bus gem version 2.1.1.

SamSaffron commented 6 years ago

What web server are you running?

smtlaissezfaire commented 6 years ago

I was using thin, but I'm open to switching. This is an app that's just for personal consumption.

SamSaffron commented 6 years ago

Try puma, unicorn or passenger, way less hacky

On Wed, 3 Jan 2018 at 6:08 pm, Scott Taylor notifications@github.com wrote:

I was using thin, but I'm open to switching. This is an app that's just for personal consumption.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/SamSaffron/message_bus/issues/157#issuecomment-354948498, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAUXXkpnm2rjPH6LkTVDHNmh6wnggArks5tGydzgaJpZM4RQXVn .

pierreozoux commented 6 years ago

@SamSaffron you'll not like it, but well, I have to try ;)

So, as you know, I'm running my own Docker image for discourse, and not the official tool. By starting discourse with rails s I'm starting puma.

As a user in the admin interface, I see these long polling of 25s. As a user in the admin interface, when I change, say the title of the discourse instance, if I reload the page, then there is no change, even if the PUT is 200. But if I wait this long polling to happen, then, yes, the change is persisted.

I think it is the same bug, but I might be wrong in my analysis. If it is the same bug, what do you recommend me to do?

Thanks a lot for your help!

SamSaffron commented 6 years ago

Quick question ... what browser are you using? Does the same issue happen in other browsers?

unteem commented 5 years ago

@SamSaffron taking over @pierreozoux so this happens independently of the browser ( tested on firefox and chromium)

So each time I make change in the admin interface I need to wait at least 25 seconds and even if I wait it can get a bit random.

For instance, if I reload without cache, I go back to the initial parameter, reload again I see my change, reload again back to initial parameter.

We are also experiencing strange behaviors when we change the logo. Sometimes its the right logo, sometimes its discourse logo. I don't know if that is related

As Pierre mentioned we are using our own docker image

Thanks for your help

SamSaffron commented 5 years ago

My guess is that you have some nginx buffering or othe type of buffering going on

On Fri, 7 Dec 2018 at 11:40 pm, Timothee Gosselin notifications@github.com wrote:

@SamSaffron https://github.com/SamSaffron taking over @pierreozoux https://github.com/pierreozoux so this happens either on firefox and chromium.

So each time I make change in the admin interface I need to wait at least 25 seconds and even if I wait it can get a bit random.

For instance, if I reload without cache, I go back to the initial parameter, reload again I see my change, reload again back to initial parameter.

We are also experiencing strange behaviors when we change the logo. Sometimes its the right logo, sometimes its discourse logo. I don't know if that is related

As Pierre mentioned we are using our own docker image https://github.com/libresh/docker-discourse/edit/master/Dockerfile

Thanks for your help

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/SamSaffron/message_bus/issues/157#issuecomment-445221319, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAUXc0BqfUE0DY-LDxMo_GqXvpmv4frks5u2mHXgaJpZM4RQXVn .

unteem commented 5 years ago

@SamSaffron nginx buffering is off, I thought it could come from haproxy that is in front of our nginx but actually the issue is fixed if I run discourse with unicorn and not puma

SamSaffron commented 5 years ago

Wow that is somewhat odd, it would indicate a bug of sorts in puma's hijack implementation, perhaps raise on puma?

On Wed, Dec 19, 2018 at 12:23 PM Timothee Gosselin notifications@github.com wrote:

@SamSaffron https://github.com/SamSaffron nginx buffering is off, I thought it could come from haproxy that is in front of our nginx but actually is issue is fixed if I run discourse unicorn and not puma

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SamSaffron/message_bus/issues/157#issuecomment-448544392, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAUXWIqV4b75LIwsF2rhsyHBYKa6dzcks5u6hObgaJpZM4RQXVn .

pierreozoux commented 5 years ago

For us the issue is closed, we'll not investigate further, I think @unteem spent 2 full days on that :) I guess we should open the issue on puma, but we don't have resources :/

I'd be curious to know why discourse is using unicorn and not puma :) (maybe for that kind of reasons)

And thanks a lot for your kind support!

SamSaffron commented 5 years ago

@pierreozoux I know for sure that both @schneems and @evanphx care dearly about making puma as robust as possible and they definitely want the hijack implementation not to stall, this is critical for web sockets and other issues.

If @unteem has any kind of repro here it would be very handy and save others multiple days of debugging.

As to why Discourse use unicorn and not puma in clustered mode, I guess we very much like the fact that rogue requests just take out a single worker and single request rather than taking out potentially a large number of unrelated requests, plus the automatic shielding against rogue gems that don't release GIL in c extensions is nice. Unicorn has treated us nicely, but yeah memory is a challenge. However rack hijack having issues was never a factor in our decision to use unicorn vs puma.

evanphx commented 5 years ago

The puma hijacking implementation explicitly performs no buffering so I'm unsure why you'd see a 25s delay, but I don't know much of anything about the message_bus code. I'm happy to look at any specific usage of the hijacking to try and see if there could be an issue though.

SamSaffron commented 5 years ago

@evanphx

We just write direct to the socket as stuff happens using chunked encoding:

https://github.com/SamSaffron/message_bus/blob/master/lib/message_bus/client.rb#L226-L257

Not sure this is a specific bug in puma though until we make some sort of test harness that works in unicorn/passenger and fails in puma which I was hoping @unteem could help with.

evanphx commented 5 years ago

That all looks just fine. Those write calls hit the socket directly and the data is sent without buffering. Probably the best bet is to use tcpdump to try to verify that the data is being sent back to the client properly though.

pierreozoux commented 5 years ago

@evanphx send us your public ssh key by mail or here, I'll give you access to a VM with a reproducible test based on discourse. Contact@indie.host

kevin-klein commented 5 years ago

I had similar issues with puma. When i disabled clustered mode, it suddenly started working.

pierreozoux commented 5 years ago

Sorry, forgot to comment back, we solved it by using unicorn back.

chriscz commented 5 years ago

Sorry, forgot to comment back, we solved it by using unicorn back.

So using unicorn instead of puma?

tobymao commented 4 years ago

i’m hitting this issue as well on puma, and more info on this?

for me i was running puma default, 0:16 and over time notifications would go from instant to to 25 seconds. i’ve since switched to unicorn as well

tobymao commented 4 years ago

an update, i'm hitting this on unicorn, again 25 second delays on some workers. if i had to guess it's just due to some kind of leak, perhaps threading related. in order to counter act this, i'm going to use unicorn worker killer

anthotsang commented 3 years ago

I'm wondering if people with this issue are not calling MessageBus.after_fork as mentioned here?

I'm running Puma in clustered mode and I was having this issue until I added the aforementioned code to my puma configuration. Removing the above code also reliably causes delays.

Upon re-reading the instructions, it does mention that this can solve non-immediate delivery for forking/threading app servers, but it seems like it should be recommended as standard configuration rather than as optional? I probably overlooked it initially because of the way it was written.

tobymao commented 3 years ago

This was not the case for me.

discourse / message_bus

Long Polling only happening every 25 seconds? #157