dbt-labs / snowplow

Data models for snowplow analytics.
https://hub.getdbt.com/dbt-labs/snowplow/latest/
Apache License 2.0
126 stars 45 forks source link

made utm split_part lower case #21

Closed SeanFLynch closed 6 years ago

drewbanin commented 6 years ago

@SeanFLynch cool! This is really close, but I think it has one unintended consequence in its current state.

By lower-casing the full url, this PR will also lowercase the utm values. This code will errantly represent the following utm_medium as somethingwithcaps instead of SomethingWithCaps:

https://url.com/page?utm_medium=SomethingWithCaps

This could break things like channel mapping code in sneaky and pernicious ways. An ideal solution here would case-insensitive match utm_medium (eg. Utm_Medium), but would preserve the case of the specified value.

If your urls are tagged with incorrectly cased utm parameters, it might be a better idea to pre-process the urls in your project before passing them to the Snowplow package. Happy to help you with that offline.

Here's an example algorithm which could accomplish something similar to your PR without lowercasing the utm values:

  1. Lowercase the whole url
  2. Find the index of the lower({{ utm_param }}) occurrence in the url
  3. Calculate the length of {{ utm_param }}=
  4. Find the substring starting at index_of_utm_param + length_of_parameter_string and terminating at the end of the string for the original url
  5. Split that substring at the & character and take the first value

So, more complicated, but definitely doable!

drewbanin commented 6 years ago

@SeanFLynch closing this, but lmk if you'd like to revisit!