Open craigbox opened 5 months ago
This could be due to HLL (hyper log log) - I can change metrics (for Istio only) to use exact count distincts instead of approximate counts that HLL gives - but this will require creating custom SQLs just for Istio usage, can be done in a day or two, but not earlier than about week or two from now.
Can you recheck and LMK if this is still needed? I've optimised some metrics recently and they no longer use HLL, so this might be Ok already. If not LMK, I'll iterate on this when I can.
Companies table:
Company | Contributions |
---|---|
Solo.io | 12905 |
Google LLC | 9679 |
DaoCloud Network Technology Co. Ltd. | 7427 |
International Business Machines Corporation | 5980 |
Huawei Technologies Co. Ltd | 5757 |
Microsoft Corporation | 3182 |
Tetrate.io | 2917 |
Ericsson | 1448 |
Salesforce.com Inc. | 1303 |
Red Hat Inc. | 1141 |
Sum of developers table:
Company | Sum of Contributions |
---|---|
Solo.io | 12513 |
Google LLC | 9518 |
DaoCloud Network Technology Co. Ltd. | 7442 |
International Business Machines Corporation | 5822 |
Huawei Technologies Co. Ltd | 5375 |
Microsoft Corporation | 3168 |
Tetrate.io | 2981 |
Ericsson | 1408 |
Salesforce.com Inc. | 1309 |
Red Hat Inc. | 1124 |
Closer, and current in the same order/ballpark.
(Edit: initial miscalculation around Google was my error.)
I will TAL on Friday or Monday.
No rush, Istio doesn't need this until January!
Hmm the first link is giving sum of all contributions (this one) while another) is giving values per developer and you are summing them manually, right?
I'll check if both use HLL or doth don't use it - actually I will also update to use exact counts in case of Istio - because HLL was used to save cycles, but it makes more sense in All CNCF
instance which has a lot of data, and here we can use just exact
counts approach 9as Istio
isn't as huge as All CNCF
instance) - let me dive into it - maybe query conditions are slightly different on those two dashboards too?
One was using HLL while another not, I will sync them now and regenerate data, then I'll let you know when finished.
Also pls note that all statistics across DevStats are not calculated "on the fly" but synced at a given point in time and saved in tables (so later Grafana UI does just a simple select to those "calculated" tables) - if calculation for "last year" happened on different tome for two metrics - they can be slightly out of sync, but the difference shouldn't be hight - after this manual sync that I'll do now - they should be as close to each other as possible.
I've regenerated data, I don't have a script to sum all developers to check those value, PTAL again pls. Hope this is OK now.
@lukaszgryglicki has regenerated our database as of ~15 minutes ago, so this data is as fresh as it comes.
The companies table reports the top 5 contributors to Istio in the last 12 months as:
However, if one exports the data from the Developer activity counts by company view for the same period, the summation is this:
Note how some companies show fewer contributions in the second list, and some have more.
Istio uses this data as part of its governance process, and last week, the order of the top 5 results shown here actually differed depending on which metric you used.
Can you help us understand why these values are different?