cosent / plonesocial.microblog

ACHIVED - USE PLONEINTRANET. Simple microblogging for Plone
4 stars 8 forks source link

instances not up to date status list #13

Open djay opened 10 years ago

djay commented 10 years ago

We are using load-balancing across instances with 2 threads each, 4 on each server. I'm getting some strange errors with status updates not appearing. Either directly after entering a status or if refreshing the microblog page several times as it round-robins through the various zope instances. Eventually the instances all get up to date but it takes a few minutes.

djay commented 10 years ago

It looks like the issue was related to the two servers have clocks out by 3 min. So one server was refusing to show future updates. Perhaps the code could be made more robust by when showing the latest updates, to include future updates to handle clock problems?

gyst commented 10 years ago

On 18/06/14 13:26, Dylan Jay wrote:

It looks like the issue was related to the two servers have clocks out by 3 min. So one server was refusing to show future updates. Perhaps the code could be made more robust by when showing the latest updates, to include future updates to handle clock problems?

Thanks for the bug report, Dylan.

I checked the code but there's no filter that excludes future status updates.

More probably this has to do with the in-memory batch queing mechanism and is possibly related to #1.

It's difficult to find and fix these kind of Heisenbugs without a reproducible test case.

Anything you can add to diagnose this would be helpful. If you see this recurring it would be informative if you could disable the batch queing temporarily to see if that removes the issue or not. You can do that by setting plonesocial.microblog.statuscontainer.MAX_QUEUE_AGE=0.

Please don't do that if your traffic exceeds 20 statusupdate insertions per second or you'll get commit conflicts.

:*CU

 Guido Stevens  |  +31.43.3618933  |  http://cosent.nl

 s o c i a l   k n o w l e d g e   t e c h n o l o g y
djay commented 10 years ago

Syncing the time of the servers solved the problem. I don't see how it could be a write queue problem since the writes were visible on some of the instances. The code for getting the keys show inc max and min does involve time.time so I suspect that is involved.