dbt-labs / snowplow

Data models for snowplow analytics.
https://hub.getdbt.com/dbt-labs/snowplow/latest/
Apache License 2.0
126 stars 45 forks source link

Feature: support embedded page_view_id #52

Closed jtcohen6 closed 2 years ago

jtcohen6 commented 5 years ago

Background

Many of our recent Snowplow installations have resulted in a single event stream table, with a schema matching Snowplow's canonical event model.

In these cases, we do not need to look up the page_view_id in a separate table containing web page context; it just needs to be un-arrayed and un-nested from the contexts object on the main events table. This change also enables a more fully incremental build, since page_view_id and collector_tstamp are united from the start.

N.B. OTF = "On The Fence" = I considered multiple approaches and picked one without being sure it's the best. Open to input.

Changelog

Comments

drewbanin commented 5 years ago

OTF Add canonical_event and canonical_event_update seeds that are exact replicas of event/event_update merged with web_page/web_page_update

Yep, I think this is appropriate. Good call.

OTF Whether to include a cross-db macro to grab values from Snowplow contexts, or to include page-view plucking by default in the snowplow_web_events_tmp, or to do neither and leave it up to the installer (status quo).

I think we should leave this up to the user, but conceivably it will make sense to provide helper models / macros for very typical use cases (like Snowflake or Spectrum nested fields).

drewbanin commented 5 years ago

@jtcohen6 do you want me to re-review this one?

jtcohen6 commented 5 years ago

@drewbanin Yessir. I've made a few more changes—related though likely beyond the initial scope of this PR—in order to support my experimentation with external tables. Namely:

I believe these changes are relevant. To my mind, the primary use case for this PR's functionality is when Snowplow data is loaded or queried, in its canonical event structure, directly from external storage.

Failing tests

I would also appreciate your eye on the failing CircleCI tests, whose operative error appears to be:

ERROR: google-api-core 1.14.2 has requirement setuptools>=34.0.0, but you'll have setuptools 28.8.0 which is incompatible.

All tests are passing for me locally.