cagov / data-orchestration

Orchestration tooling for the CalData Data Services and Engineering team
MIT License
0 stars 0 forks source link

`calinnovate_dynamodb` dataset change #27

Closed reganking closed 1 year ago

reganking commented 1 year ago
          @reganking as we discussed a bit last week, this new setup has the benefits recommender dataset and the feedback form dataset use the same connector from fivetran and load into the same schema. This lets us share some cloud infrastructure and reduce some cognitive load. The old BigQuery dataset is still working, but I'd like to turn it off and migrate to using the `calinnovate_dynamodb` dataset.

Can you check whether your dashboards and analyses can be pointed at dse-product-analytics-prd-bqd.calinnovate_dynamodb.benefits_recommendation_api_production_events_table_pigrtv_2_dywtm? It should be a drop-in replacement.

Originally posted by @ian-r-rose in https://github.com/cagov/data-orchestration/issues/24#issuecomment-1448468614

reganking commented 1 year ago

@ian-r-rose I will test now and let you know if I have an issue.

Update: Troubleshooting Looker

Doing:

Planned:

Cc: @mediajunkie @aaronhans @britt-allen

ian-r-rose commented 1 year ago

Yeah, #24 wasn't quite ready to be closed. But I appreciate you looking into it @reganking!

reganking commented 1 year ago

Looker component test fails where custom fields are required. Custom fields, like PST, clicks, views, need to be added to the datasource in the new dashboard. Reviewing: Export/import calculated fields between Looker studio dashboards.

This is a Looker Studio issue and not an issue with the new dataset / new gBQ project or connection.

@ian-r-rose I propose closing unless you know something about preserving calculated fields or want to track to end / removal of the old project (ideal I'd think).

ian-r-rose commented 1 year ago

This is a Looker Studio issue and not an issue with the new dataset / new gBQ project or connection.

@reganking Is it possible to instead keep the same data source with custom fields, but to edit the connection information there with the new table location?

reganking commented 1 year ago

Cool, that seems to work. Thanks @ian-r-rose. Just got a notice about two changed fields. Check through now ...

reganking commented 1 year ago

@ian-r-rose, I noticing something interesting in both old and new dataset location. It seems that we're missing data from Nov 28-Dec 6. You can see this on the time series dashboard.

Do you know if you may have truncated the start of data?

reganking commented 1 year ago

There was a notice about "changed semantic configuration" on the page_url field. Aaron made a field change that maybe coincides with the Dec 6th date.

ian-r-rose commented 1 year ago

@ian-r-rose, I noticing something interesting in both old and new dataset location. It seems that we're missing data from Nov 28-Dec 6. You can see this on the time series dashboard.

Do you know if you may have truncated the start of data?

I'm not sure what could be causing this. We did have some issues with the fivetran sync around Nov 28, but I believe they were resolved around the 29th.

The new table should be pulling fresh from dynamodb, so I wonder if that missing data is reflected in the underlying table?

reganking commented 1 year ago

Figured it out. Aaron added the page_url value after Dec 6. Prior to that it's null. I need to filter on the display_url for Nov 28-Dec 5 data.

ian-r-rose commented 1 year ago

Great, thanks for following up! Do you have everything you need @reganking? Is it safe to move forward with the calinnovate_dynamodb dataset?

reganking commented 1 year ago

Yes, thanks @ian-r-rose. All instances of the old project have been replaced with calinnovate_dynamodb.

ian-r-rose commented 1 year ago

Thanks for the quick turnaround @reganking!