adobe / helix-run-query

service that executes queries on BigQuery datasets generated by Helix-Logging
Apache License 2.0
6 stars 11 forks source link

[RUM] Filter fake weights from EVENTS_V5 #1157

Closed lydiapuric closed 1 day ago

lydiapuric commented 2 days ago

Description When collecting RUM data, customers can set fake weights that differ from the expected values 1, 10, 100, and 1000. Because the data is sampled and the weight is critical to calculate the traffic, this results in gigantic and wrong traffic numbers.

To Reproduce Steps to reproduce the behavior:

  1. Go to RUM dashboard
  2. Have a look at 'www.bigfishgames.com', the numbers are gigantic and wrong

Different fake weight recorded eg on June 25th and 26th for domain www.bigfishgames.com: 100000000000000000000, 18446744073709552000, 4294967297, -100000000000000000000,

Expected behavior Only the following weights are valid: 1, 10, 20, 100, 1000 Note: All other weights, eg 101, is considered invalid and should be ignored in EVENTS_V5.

Screenshots

rum_dashboard_fake_weights

Version: 14.21.4

Additional context ...

@trieloff

lydiapuric commented 2 days ago

here are all the host and days where we have outliers (weight < 1 or weight > 100) for june 24 dev.bhgfinancial.com www.iubesteinghetata.ro www.rwgenting.com quantum.microsoft.com www.woo.pt www.bigfishgames.com www.rwkijal.com www.pluralsight.com www.nos.pt