Meteor-Community-Packages / meteor-user-status

Track user connection state and inactivity in Meteor.
MIT License
557 stars 86 forks source link

DB/Container performance issues on new container creation (from this package) #158

Closed mostlyharmless2024 closed 2 years ago

mostlyharmless2024 commented 2 years ago

Our applications manages ~100 users per container, but whenever we scale up (or worse, deploy) the new containers perform an "update many" on our users collection which takes 20+ seconds, seemingly lagging our Meteor <-> Mongo connectivity on ALL containers, and causing intense CPU spikes/freezing:

Screenshot 2022-05-16 103123

I boiled it down to this code:

https://github.com/Meteor-Community-Packages/meteor-user-status/blob/02d1c05fdf576fa6e985595a18031ec6619e51b4/server/status.js#L135

Which fires an update { multi:true } on ALL users:

return Meteor.users.update(selector, { $set: { 'status.online': false }, $unset: { 'status.idle': null, 'status.lastActivity': null } }, { multi: true });

It's not clear to me what the implications of removing this are... Anyone with a deeper understanding of this package have any ideas/suggestions? Are we safe to install this package locally and just comment out this functionality? (It appears so but, idk all the implications...)

Thanks!

copleykj commented 2 years ago

So basically as far as I can tell this is resetting the online status for every single user every time a server starts up. It seems to me that commenting this out could cause mismatches in the online/offline state of some users.

I don't believe this package was designed to handle scaling very well. It doesn't have any way to know if a single server is coming online, if/when a server has gone offline, or what server is in charge orchestrating everything.

Conversely socialize:user-presence was designed precisely for this scenario. It allows for graceful handling of servers coming online, going offline and fresh starts.

mostlyharmless2024 commented 2 years ago

Thanks for this reply- gotta dig into socialize:user-presence more - we attempted to transition that last week- everything seemed normal in our testing but once it was live we had all kindas of build failure issues related to simple-schema and collection2

Generally not thrilled about the number of dependencies user-presence has... thinking we'll have to adjust the user-status package to support multiple containers etc.

It seems to me that commenting this out could cause mismatches in the online/offline state of some users.

So- my understanding/testing so far indicates that once a container goes offline the connection "status" isn't updated for the users that we online- on that container.

But the second they're reconnected to the new container- their status is updated appropriately.

I see this generally in our prod database profiler, a big 20 second update looking at all users (updating a few hundred) and then hundreds of 1-off updates for each user "coming online" again.

Gotta test more but, thanks for the input!