LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.17k stars 872 forks source link

[Bug]: Number of active users in community for day, week and month are incorrect #4306

Closed ancientmarinerdev closed 9 months ago

ancientmarinerdev commented 9 months ago

Requirements

Summary

When I go on this community that is around a day old https://lemm.ee/c/survivalgames

The active users is out of sync, weekly users is greater than monthly. Eventually daily users is also greater than monthly users. Something isn't quite right with the calculation. It does make you doubt the metrics when you see something like this, so it needs to be verified for accuracy.

Screenshot_20231220_003139 Screenshot_20231219_232637

Steps to Reproduce

  1. Create new community that is around a day old. Such as the following: https://lemm.ee/c/survivalgames
  2. Check the active users
  3. See the discrepency

Technical Details

None.

Version

0.19.0

Lemmy Instance URL

lemm.ee

dessalines commented 9 months ago

Two things:

ancientmarinerdev commented 9 months ago

Two things:

* Make sure the scheduled tasks is running correctly, and that it didn't crash? It should update the active counts hourly.

* Can you verify that lemm.ee 's migrations ran correctly? We updated the SQL function for community and site_aggregates.

I'm just a user, not an admin. I'm assuming only @sunaurus can do this?

If the scheduled task runs hourly, why would the weekly and monthly figure differ? Are these run at the same time?

Demigodrick commented 9 months ago

FWIW it looks correct now

image

ancientmarinerdev commented 9 months ago

It may have self-righted itself, but it still happened, and while logically monthly total cannot be less than weekly in this scenario, it was. What is the reason for this? Dodgy calculation? Caching? Different number calculated at different times?

dessalines commented 8 months ago

It recalculates every 15 minutes.

sunaurus commented 8 months ago

The active users query is quite expensive, and it is regularly hitting the statement timeout on lemm.ee during periods of high load, so this is expected at the moment.

For lemm.ee, I am intending to make the database server more powerful this weekend, so this should help a bit already.

It might also be a good idea to try and optimize the calculation on Lemmy side, for example through some denormalization (like storing last_activitiy timestamps for each user directly, rather than calculating it every 15 minutes for all users).

Nutomic commented 8 months ago

We removed db timeouts in https://github.com/LemmyNet/lemmy/pull/4301 so that might be enough.

dessalines commented 8 months ago

It might also be a good idea to try and optimize the calculation on Lemmy side, for example through some denormalization (like storing last_activitiy timestamps for each user directly, rather than calculating it every 15 minutes for all users).

The postgres functions are called site_aggregates_activity and community_aggregates_activity, and they seem to be fairly simple union and count queries.

Also I was wrong, currently the active counts are updated every hour, and its the hot ranks that are every 15 minutes. I'd be fine with making those run less often than every hour tho, maybe every 4 hours or something. If you would @sunaurus open an issue for that one.