Closed jtcohen6 closed 1 year ago
@jtcohen6 these are good questions! I've seen systems support configuration for GA (and similar) tracking by accepting a tracking ID and auto-tracking events. I bet we could do something like that, but I of course know there are some Snowplow shops using dbt in the world too :)
I don't think we're going to want to allow arbitrary JS injection -- I can see that being pretty brittle. If you know folks that are interested in adding tracking to the docs, I'd love to hear what they think about this here!
Thanks for creating this issue, @jtcohen6!
If documentation is a differentiator for why DBT outclasses competitive solutions, then we need a way to validate that assertion in retrospect. Doing so with data is both possible and compelling.
From my perspective, I'd like a holistic view on how people are engaging with the data and information my team curates. For our stack, that's:
If I see that people are building dashboards on a table but not reading its documentation, then either they're not familiar with the docs or the docs are not valuable. That's useful feedback that I can glean without interrupting my users. It's also actionable because it helps us decide how much time we should spend documenting things.
To meet that use case, I would ideally like page view data from all documentation repositories (e.g. Confluence, dbt docs, Notion) loaded into my data warehouse. I could then marry this with the query audit logs and Tableau usage data for the same users to create a holistic picture of their interactions with data.
The data I'm interested in for dbt docs is primarily for the DBT Cloud use-case:
Nice to have:
This would be plenty of info for me to see who is aware of the documentation and who is really using the documentation.
I'm a fan of Google Tag Manager for decoupling release cycles between marketing analytics and product features. My inclination is to use this as a solution for managing page view tracking.
To enable the event tracking above, you would push events to the dataLayer
variable any time something interesting happened, and it would be simple to create triggers in GTM.
But loading a GTM container gives the opportunity for arbitrary JS injection, so there are a few possibilities:
Personally, I think option 3 is a pretty cool concept. Consistent with the open source ideology and flexible enough to support a variety of users. Happy to help if we decide to go that route!
If this data is tracked and loaded in a consistent way for all DBT docs users, it should be possible to create plug-and-play solutions for basic analytics (i.e. a dbt package and a Looker/Tableau/Data Studio dashboard). These could serve as both useful tools for supporting a business and good demonstrations of how DBT works in practice. Additionally, for users (like me) that want to marry this with other data, loading it into a Data Warehouse enables that while following an ELT approach.
Thanks for this really thorough writeup @mferryRV! I really buy what you're saying here.
I think that of the three options you outlined, the first one might be the most tractable for us. I can definitely imagine the docs site pushing events onto a window variable in a structured (and documented) way, then allowing users to slurp up those events however they see fit.
I too really like GTM, but I'm not certain that it's appropriate for every org using dbt docs out in the wild. A solution that lets folks leverage GTM with minimal effort without requiring it feels like a good hybrid approach to me.
I like that you touched on the user id component in dbt Cloud. We definitely do have user ids that we can expose in these events, and there are good ways to map these user ids onto identities of dbt Cloud users.
The following things are super easy to track:
I think scroll depth is comparatively harder to implement. Something like Snowplow implements this out of the box, but i shudder to think about calculating scroll depth in a cross-browser way ourselves. If you're able to share, can you tell me which system you'd likely use to record these events?
We are really interested in the feature as well but most of the requirements are already discussed in the above comments @drewbanin so just +1 from me.
Thanks for reminding me about this, @nehiljain!
@drewbanin - we would be comfortable using either Segment or GA to track these events.
I just wanted to pick this thread back up. I've been mulling over some implementation considerations for a little while now. I think the most compelling version of this for users of the docs site would be to allow javascript snippets (eg. a GA tracking pixel, or Snowplow tracking code, or a GTM import...) in the docs site.
I think that's something we can't readily (or really, don't want to) do. The big issue here is that we run the dbt Docs website in dbt Cloud, and it's a terribly bad idea to allow folks to write custom JS that runs for other users inside of the dbt Cloud application. You could, for instance, make requests to authenticated endpoints on behalf of another user if we allowed arbitrary JS snippets. We also know that some folks are running, or plan to run, the docs site in other hosted applications, so this isn't specifically a dbt Cloud constraint -- it's more that it limits the appropriate deployment models for this site too greatly.
I think the next best thing would be to allow the configuration of the docs website with either:
I'd be in favor of prioritizing a change like this for the v0.18.0 (Marian Anderson) release of dbt. We'd likely want to start with a GA integration that fires pageview events (and maybe usage information, like viewing the DAG) to a configured GA account.
From there, we can expand support out for other tools like Segment and Snowplow if folks ask for it!
Y'all buy that?
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.
Reopening based on more interest in https://discourse.getdbt.com/t/is-user-tracking-possible-in-dbt-docs/6148
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Analytics teams may wish to embed their own tracking snippets (Snowplow, GA, etc) on the docs site. This would enable them to:
I'm wondering if there's an approach that can work across all deployment types:
Questions: