Closed namimody closed 2 months ago
@tdlowden Is this an outcome of sampling?
Yes, I actually have been working with our Google vendors today trying to assess. As of this morning, our GA account is sampling at astronomical levels (using <1% of sessions) in places we could always get a 100% of sessions report previously. Not sure if there is something going on at Google, but this issue apparently had already been in place when the reports were run for our data downloads last night. We're looking to find the cause.
It's due to sampling.
On Mon, Nov 21, 2016 at 2:42 PM, Eric Mill notifications@github.com wrote:
@tdlowden https://github.com/tdlowden Is this an outcome of sampling?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/18F/analytics.usa.gov/issues/410#issuecomment-262045107, or mute the thread https://github.com/notifications/unsubscribe-auth/ADIpx67IYsVKGtvt0Ym4xqABQ4enbf_yks5rAfQsgaJpZM4K4m1H .
As an update on this, it looks like the problem was still occurring when the reports were run again last night, but from what I can tell in the GA interface, the issue seems to have been rectified as of 8:20 am. Hopefully, tomorrow our reports won't be subject to a sample, or at least not one as great as the sampling that was happening.
First off - belatedly, thank you very much for the issue, @namimody. I wanted to check in with a brief update.
We've known that this is an outstanding issue and unfortunately, it's not resolved. My understanding is that we may not have any luck getting GA to turn off sample this far down the the rabbit hole. We'll have to continue to balance where to draw the line including more results in the data downloads and not including sampling.
@tdlowden - in the meantime, what are your thoughts on adding a sentence to the footer, and possibly including an *
pointing to said disclaimer next to the datasets that have this as an issue?
Closing stale issue.
BUG
Current Behavior
Seems improbable that many agencies would have the exact same number of visits, pageviews, users, exits to their sites...
Download from 11.21.16 (pink highlighted columns have improbable data): https://docs.google.com/spreadsheets/d/11wvYC1HyRZ3E5yZs1zj_etGqi8ysAkFu3V5g2MttoHA/edit?usp=sharing
Desired Behavior
Data collection methods should be revisited for number of visits, pageviews, users, exits.
Steps to Replicate
1) Go to https://analytics.usa.gov/data/ 2) download "Visits to all domains over 30 days" CSV 3) sort "number of visits, pageviews, users, and exits columns from A-Z -- note the duplicate field entries.
Why This Matters
This data is only helpful if it is correct!