FoldingAtHome / fah-issues

49 stars 9 forks source link

Reduce time period for determining 'Active clients' in the OS stats report #1558

Closed anand-bhat closed 3 years ago

anand-bhat commented 4 years ago

Your Environment


Expected Behavior

The OS stats report should use a shorter time period for determining active clients.


Current Behavior

The OS Stats report considers clients that have returned a WU in the past 50 days as active clients to include them in the stats report. This may have made sense in the past when WUs ran for a long term but with the current WUs not taking longer than a couple of days on the slowest hardware, this definition ought to be revisited.


Possible Solution

Consider using a 1 day, 3 day or a 7 day period for active hosts in the OS stats report.


Steps To Reproduce

image


Context

With a number of clients and cloud based volunteers running containerized instances of the client, and not preserving machine ids across installs, using a 50 day window for active users leads to inflated counts in the OS stats report.


UofM-MartinK commented 4 years ago

There are many issues with that sliding window as you described (not only cloud, just clusters & supercomputers in general), and 50 days is too long. But just shortening that window will only partly fix that.

Also, if just making this window shorter, there is a continuity break in how the performance of FAH is/was measured. But this is in inevitable in the long run, I fear.

Instead of just shrinking the window, I was thinking along the lines of "aggregating" those "unique SlotIDs".

For example like this: Sum up the WUs for all SlotIDs of the same type (i.e. performance or performance bracket) which contributed less than, say, 10 WUs (in those 50 days), and divide that number by the median WUs completed by the remaining SlotIDs of that type.

This way, those "unique SlotIDs" would be counted in a manner similar to "regular SlotIDs", which rarely change.

PantherX commented 4 years ago

FYI, we will also need to consider the implication of ARM devices once it has been released. Plus, we need to factor in large Projects and the longer expiration date to ensure that the fix would be future-proof.

UofM-MartinK commented 4 years ago

Good point. A "sliding window" twice the length of the longest (planned) expiration date would be a good start?

My aforementioned "proposal" would group by "Slot Types" before combining "unique SlotIDs". Instead of a fixed limit of e.g. 10 WUs to tally those WUs of "unique SlotIDs", this threshold could also be determined based on the distribution - this way, if ARM slots would only return a WU much slower, say, every week, they would still be counted as independent slots, but those who only return one a month would be aggregated. For fast CPUs and GPUs, these limits would be automatically higher/shorter.

anand-bhat commented 3 years ago

I'm closing this as the window for stats was reduced to 3 days -- https://foldingathome.org/2020/09/27/updating-our-cpu-and-gpu-counts/. Feel free to create a new issue for any enhancements in determining what an "active" client is or any other changes to FLOPS reporting.