cjb / codex-blackboard

Meteor app for coordinating solving for our MIT Mystery Hunt team
GNU Affero General Public License v3.0
25 stars 17 forks source link

Ringhunters chat should not suck. #192

Open cscott opened 9 years ago

cscott commented 9 years ago

So this hunt, we had to periodically purge history from ringhunters chat in order to maintain website responsiveness. That sucked! But recording the script I used (prune.mdb):

use meteor;
db.messages.find({room_name:"general/0"}).sort({timestamp:-1}).limit(1).forEach(printjson);
db.pages.findOne({room_name:"general/0",next:null});
//db.messages.remove({room_name:"general/0",timestamp:{$lt:1421552912802}});

Then run this with meteor mongo < prune.mdb.

Also, startup connection storms were also The Worst. We managed these by:

  1. Taking down nginx on codexian.us (which forwarded to meteor)
  2. Restarting meteor, letting it come all the way up
  3. Restarting nginx.

I think this worked in part because meteor handles a dropped connection slightly differently that a hot code swap. For a dropped connection some sort of randomized backoff & retry was used, which spaced out the reconnection enough to allow meteor to recover from the restart.

Ideas for next year: 1) test hunt in early december to figure out if we still have a load issue after the annual meteor rebase. 2) fill up the db with messages for testing, since part of the issue seems to be related to size of ringhunters chat 3) separate ringhunters chat db from puzzle chat db, so that ringhunters doesn't bring down puzzles. (this would be annoying to code, sigh) 4) check to see if the followup processing is slowing things down. Perhaps we need to throttle that explicitly -- like, add a followup dirty: true field whenever we insert a new message, then once every throttle period start by doing a search for the earliest and latest timestamp with followupDirty=true, and replaying that range (or, technically, between the messages immediately preceding and following that range) when updating followups. 5) throttle should take into account how long it takes to synchronously execute the throttled computation. The underscore _.throttle appeared to run every minute, regardless of how long it tool to do its update task. We might need to write our own version of throttle, which sleeps X minutes after the synchronous task completed, before reruning the update. 6) Perhaps the $or in the messages-in-range-nick subscription is confusing minimongo into doing expensive searches through all messages. Try separating that out into two separate subscriptions (one for PMs sent by me, and some for PMs sent to me). 7) Use a real database, instead of mongo?

cscott commented 8 years ago

Additional idea: make the prune-db script "the new normal" by adding a process to serialize messages to a on-disk log file after a certain time as past (messages more than N minutes old, for example). Then somehow hack the client to do a fetch of those static files to fill in the back history. Perhaps as a start just piggy back on the paging feature, and fall back to two different paging mechanisms?

To do: figure out how to integrate read-file to a static file w/ the existing db mechanism.

Torgen commented 6 years ago

It looks like you have an index of {room: 1, timestamp: -1}, but you observe the collection in timestamp ascending order. Maybe adding a {room: 1, timestamp: 1} index as well would speed things up.

Torgen commented 6 years ago

Did ringhunters chat suck this year? It was great for us, but we were much smaller.

JeffJakubowski commented 6 years ago

My only complaint was the scrollback was pretty short.

I went to look at our main chat after wrapup to read commentary and could only see the last couple dozen comments.

Jeff

On Mon, Jan 15, 2018 at 8:45 PM, Dan Rosart notifications@github.com wrote:

Did ringhunters chat suck this year? It was great for us, but we were much smaller.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cjb/codex-blackboard/issues/192#issuecomment-357829770, or mute the thread https://github.com/notifications/unsubscribe-auth/AGKF6gDc8WfNjhUm31bkuuebh3OhJYH1ks5tK_9SgaJpZM4DUYgR .

cscott commented 6 years ago

I didn't actually see any problem this year. The chromebooks sometimes struggled, but they are very slow.

I think we no longer observe the collection in reverse index order, now that we are using client-side followups?

When I looked at the code, I thought that the reverse index should still work fine for the codepaths we were looking at since as long as the room name is fixed a reverse index and a forward index should be equivalent. I could have overlooked something though.

On Mon, Jan 15, 2018 at 8:45 PM, Dan Rosart notifications@github.com wrote:

Did ringhunters chat suck this year? It was great for us, but we were much smaller.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjb/codex-blackboard/issues/192#issuecomment-357829770, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJhsJcR9r7nZ89GzsB58itATGJe5kwRks5tK_9SgaJpZM4DUYgR .

-- ( http://cscott.net/ )

Torgen commented 6 years ago

So will you remove the server-side follow-ups as an option, or leave it with a switch?

On Jan 16, 2018 2:02 PM, "C. Scott Ananian" notifications@github.com wrote:

I didn't actually see any problem this year. The chromebooks sometimes struggled, but they are very slow.

I think we no longer observe the collection in reverse index order, now that we are using client-side followups?

When I looked at the code, I thought that the reverse index should still work fine for the codepaths we were looking at since as long as the room name is fixed a reverse index and a forward index should be equivalent. I could have overlooked something though.

On Mon, Jan 15, 2018 at 8:45 PM, Dan Rosart notifications@github.com wrote:

Did ringhunters chat suck this year? It was great for us, but we were much smaller.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjb/codex-blackboard/issues/192# issuecomment-357829770, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAJhsJcR9r7nZ89GzsB58itATGJe5kwRks5tK_9SgaJpZM4DUYgR .

-- ( http://cscott.net/ )

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cjb/codex-blackboard/issues/192#issuecomment-358122494, or mute the thread https://github.com/notifications/unsubscribe-auth/AI8ZKHXv6iqgRVWTP0iY-G6VdqW5Oh63ks5tLRyRgaJpZM4DUYgR .