department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 203 forks source link

Migrate GA Domo datasets to BigQuery #11513

Closed joanneesteban closed 3 years ago

joanneesteban commented 4 years ago

Issue Description

How might we switch over GA Domo datasets to BigQuery?


Tasks

Acceptance Criteria

joanneesteban commented 4 years ago

Sprint 32 Tasks

In conjunction with #6356

jonwehausen commented 4 years ago

Update 8/12

Next Steps

jonwehausen commented 4 years ago

Google Sheet Tracking Migration Items

joanneesteban commented 4 years ago

@jonwehausen @bsmartin-ep do we have updates on:

Document what ETL funnels might look like Document how time to complete might look like

and if we can do them for the KPI dashboard switchovers next sprint? It looks like Caregivers has one of these metrics.

bsmartin-ep commented 4 years ago

Monthly - OMB Dashboard has been migrated to BigQuery,

The ETL process that creates the final output table used in all the cards has been updated to use three components:

jonwehausen commented 4 years ago

Thank you @bsmartin-ep!

Also side point / update for @joanneesteban: For prior needs to Magic ETL before BigQuery for things like data type changes, scrubbing, parsing, etc, we're able to do more of these basic ETL needs within our SQL queries directly, so the data is ready to go when pulled into DOMO. Of course this assumes the use of those functions doesn't create a major cost inflation, which we have yet to see.

joanneesteban commented 4 years ago

Thank you for the updates, @bsmartin-ep @jonwehausen ! Can we test out the time to complete ETL with Caregiver on Staging?

assumes the use of those functions doesn't create a major cost inflation, which we have yet to see.

Would like to see if effort levels are lower, data integrity is there, and ^cost isn't incredibly inflated.

Also, idea for super low-lift (and by super low-lift, I mean, just one metric instead of many, so who knows) Prometheus MVP would be the one "system availability" metric on the OMB dashboard.

bsmartin-ep commented 4 years ago

Card 'Total Pageviews' on the VEO dashboard has been switched over to the BQ - Generic - Overall PVs, Sessions, Users, Bounces by Day dataset.

I added a Beast Mode field to filter the datatable to anything before the previous month so that we don't display incomplete month data erroneously as a drop.

(CASE
    WHEN
        YEAR(`DATE`) < YEAR(CURRENT_DATE()) OR
        (YEAR(`DATE`) = YEAR(CURRENT_DATE()) AND MONTH(`DATE`) < MONTH(CURRENT_DATE()))
    THEN 'Y'                                  
    ELSE 'N'
END)
bsmartin-ep commented 4 years ago

@jonwehausen -

2 x Total Pageviews and 2 x Users cards (OMB and Monthly Council) have been migrated from dataset Va.gov & Vets.gov PVs and Users.

This required creating a modernized BQ dataset and a Beast Mode field (dataset-scoped) to only show data before the previous month.

Pageviews look pretty close.

Users are dramatically different as you'd expect going from GA --> BQ

Users

Before: image

After: image

I think that's it for these old GA datasets...

joanneesteban commented 4 years ago

That is a very dramatic difference...is that also after de-duping?

bsmartin-ep commented 4 years ago

@joanneesteban / @jonwehausen -

Cards have been updated to new monthly BQ datasets. Looking a lot closer now.

Old New BQ
image image

Also updated the pageview cards.

Old New BQ
image image

Not quite the exact match on Pageviews I was expecting, but curiously our "new" number matches the July 2020 numbers more closely:

image

(89.93M vs the new 92.39M)

joanneesteban commented 4 years ago

Great! Thanks, @bsmartin-ep ! Do you know why the July numbers for page views would have changed more than the others?

@amycesal , since these reports will start to go out next week, if we're ready to switch over the datasets to BigQuery let's start thinking about how to add annotations that display what the changes are and the value.

bsmartin-ep commented 4 years ago

Do you know why the July numbers for page views would have changed more than the others?

I think I figured out the issue with pageviews @joanneesteban. The old GA report is configured to run on the last day of the month minus one day (and not the first of the next month like it should probably be). So we've been truncating the last day of pageview data from our monthly total.

HOWEVER - it's also re-fetching all data back to 2018 every time it runs, so when the report runs, it fixes the last day of the previous month.

joanneesteban commented 4 years ago

Audit to do

bsmartin-ep commented 4 years ago

@joanneesteban -

Pageview dataset has been fixed and back-filled to correct July 2020.

image

I also adjusted the scheduling and back-filled this one:

This one also had the issue, but I don't think it should be running anymore. It's pointing to the old vets.gov site and is only (somehow) contributing a few ambient pageviews every month. Maybe a cached copy on archive.org?

joanneesteban commented 4 years ago

Thanks, @bsmartin-ep ! Yes, if that's grabbing vets.gov info, cached copy is probably better.

bsmartin-ep commented 4 years ago
Dashboard Name Type Old New BQ
COVID-19 - VA.gov Response COVID https://va-gov.domo.com/page/1872052027 https://va-gov.domo.com/page/2077110052
Auth Experience - KPIs - WIP KPIs n/a https://va-gov.domo.com/page/492918243
Caregivers - KPIs - WIP KPIs n/a https://va-gov.domo.com/page/1905395623
COVID Chatbot - KPIs KPIs https://va-gov.domo.com/page/62901069 https://va-gov.domo.com/page/986858114
eBenefits - KPIs - WIP KPIs https://va-gov.domo.com/page/177811372 https://va-gov.domo.com/page/1726972169
Facility Locator - KPIs KPIs https://va-gov.domo.com/page/444997301 https://va-gov.domo.com/page/1129885232
Medical Device Reordering Tool - KPIs KPIs https://va-gov.domo.com/page/1133411197 https://va-gov.domo.com/page/737992383
VAOS - KPIs KPIs https://va-gov.domo.com/page/565662008 https://va-gov.domo.com/page/1769944412
Monthly- Digital Modernization Council Dashboard Recurring https://va-gov.domo.com/page/95303793 https://va-gov.domo.com/page/42290444
Monthly- OMB dashboard Recurring https://va-gov.domo.com/page/1475431113 https://va-gov.domo.com/page/1587068891
Monthly- VEO Dashboard Recurring https://va-gov.domo.com/page/1817321505 https://va-gov.domo.com/page/730309340
bsmartin-ep commented 4 years ago

@bmcgrady-ep -

eBenefits is ready for your QA.

Dashboard Name Type Old New BQ
eBenefits - KPIs - WIP KPIs https://va-gov.domo.com/page/177811372 https://va-gov.domo.com/page/1726972169

Some important things...

  1. The old datasets were running (maybe incorrectly) as a rolling 60 day REPLACE. I fixed them to start appending a few days ago. Nontheless, the BQ and GA datasets may have different overall date ranges. Pretty close (especially if you filter down to a matching data range at the card level) but not exact.

  2. The bounce rate will be dramatically lower on the BQ dashboard due to the artificially high bounce rate of the dedicated eBenefits GA view we were using. The more comparable bounce rate would be if you filter pages in the All View to just these pages and looked at the Bounce rate for them:

    AND REGEXP_CONTAINS(
            hits.page.pagePath,
            r'^(eauth\.va\.gov|www\.ebenefits\.va\.gov)'
    )
    AND NOT REGEXP_CONTAINS(
        hits.page.pagePath,
        r'(/mhv-portal-web|/web/myhealthevet/)'
    )

Thanks and let me know what you find.

bmcgrady-ep commented 4 years ago

@bsmartin-ep - Overall, QA looks good. Few things to note:

  1. BQ card for PDF Downloads does not have a summary for Total PDF Downloads like the GA version does (screenshot)
  2. I filtered down to just this week for the Outbound Links cards and the data doesn't match super close. Maybe just confirm that it is correct?
bsmartin-ep commented 4 years ago

@bmcgrady-ep -

  1. Fixed
  2. Look OK after I back-filled?

Can you also note if these look OK?

bsmartin-ep commented 4 years ago

@bmcgrady-ep -

VAOS dashboard ready for your review. Thanks!

Dashboard Name Type Old New BQ
VAOS - KPIs KPIs https://va-gov.domo.com/page/565662008 https://va-gov.domo.com/page/1769944412
bsmartin-ep commented 4 years ago

@bmcgrady-ep -

BAM2 Medical Device dashboard is also ready for your review. Thanks!

Dashboard Name Type Old New BQ
Medical Device Reordering Tool - KPIs KPIs https://va-gov.domo.com/page/1133411197 https://va-gov.domo.com/page/737992383
bmcgrady-ep commented 4 years ago

@bsmartin-ep VAOS Dashboard Review

bmcgrady-ep commented 4 years ago

BAM2 Medical Device Dashboard Review

bsmartin-ep commented 4 years ago

@bmcgrady-ep -

Both ready for re-review.

VAOS

  1. Fixed
  2. This needed backfilled and attached to the correct segment. Looks a lot better!

BAM2

  1. Fixed
  2. I got these closer. The conversion funnel stuff will be a bit different since it's re-built in BQ. Thoughts?
bmcgrady-ep commented 4 years ago

@bsmartin-ep VAOS - The Number and frequency of Veterans returning to VAOS card in the GA dataset looks to be double counting.

BAM2 - The Appendix still looks a little off

bsmartin-ep commented 4 years ago

@bmcgrady-ep -

Chatbot KPI dashboard ready for your review:

Dashboard Name Type Old New BQ
COVID Chatbot - KPIs KPIs https://va-gov.domo.com/page/62901069 https://va-gov.domo.com/page/986858114

Thanks!

bsmartin-ep commented 4 years ago

@bmcgrady-ep -

FYI - I'm blocked on the Facility Locator KPI page, so nothing to review on that yet.

  1. We don't have the same site search keyword stuff on the All View, so we can't do a direct migration to that. I think % of sessions with more > 1 search is a good replacement metric, but we need those events recording first.
  2. We should switch to the BigQuery time-to-complete stuff in lieu of the (over-sampled) GA user timing. That's also dependent on the new fl- events.
  3. We have a card tracking certain CTA - Button clicks for an event that stopped recording in August. Checking with Michelle on that.

Moving this issue to QA now. Thinking with all this new data (and our blockers) it might be a good idea to kick off a redesign of the FL KPI page.

bmcgrady-ep commented 4 years ago

@bsmartin-ep COVID Chatbot Review

bsmartin-ep commented 4 years ago

Thanks for reviewing @bmcgrady-ep.

Inbound

Looks good (other than the scale getting bumped up) now that I backfilled the GA one.

image vs image

Percentage of mobile

Forgot to restrict it to mobile and modified the query to look at the immediate next event. Looking good!

image vs image

Left chart

This one? What differences are you seeing?

image vs image

bmcgrady-ep commented 4 years ago

@bsmartin-ep - Okay now the Left chart is looking good. Sorry for the delayed response. Are there any other dashboard you need reviewed?

jonwehausen commented 4 years ago

@bsmartin-ep Wanted to get in writing thoughts / next steps on our current solution to overcome our daily export / query timing discrepancy issue, here are potential next steps:

As a quicker win to our end goal, we could first: 1) Add this snippet to all of our current queries in domo

IF (
  SELECT
    COUNT(1) > 0
  FROM
    `vsp-analytics-and-insights.176188361.__TABLES__`
  WHERE
    table_id = CONCAT('ga_sessions_', FORMAT_DATE(
        '%Y%m%d',
        DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
    ))
) IS FALSE THEN
  RETURN;
ELSE

2) Adjust the scheduling in domo to be "Advanced" for every 30 minutes beginning at 7:45 am EST

Following, we could 1) Convert and save all of our current queries to views in BQ 2) Write a stored procedure to do the IF-ELSE check in BigQuery first 3) Adjust the queries in DOMO to query each view (simplify the DOMO query) 4) Verify advanced scheduling is still configured to every :30 minutes beginning at 7:45 am

cc: @joanneesteban

joanneesteban commented 3 years ago

Thanks for the work on this! Closing this ticket.