fivetran / dbt_twilio_source

Fivetran's Twilio source dbt package
https://fivetran.github.io/dbt_twilio_source/
Apache License 2.0
0 stars 2 forks source link

Twilio numeric fields cast as strings #7

Open raphaelvarieras opened 1 month ago

raphaelvarieras commented 1 month ago
          @raphaelvarieras that is a good callout, but would you mind creating a new issue for that? I'd prefer to tackle in it another release to ensure we have ample time to figure out each field we'd want to (and logically should) cast as an integer, as I see that other tables have quite a few numeric fields cast as strings (ex: `duration` and `price` in `CALL` and `count` and `usage` in `USAGE_RECORD`)

_Originally posted by @fivetran-jamie in https://github.com/fivetran/dbt_twilio_source/issues/6#issuecomment-2108454742_

fivetran-jamie commented 1 month ago

thank you 🙌

fivetran-jamie commented 1 month ago

so, we need to investigate the following tables' fields, which are coming in from the connector as varchars/strings but sound (and look) like they should be numeric types:

i'll talk to the data PMs to understand why this might be the case from the connector-side of things, but otherwise this would be straightforward to implement ion the package. in the final CTE of each staging model we'd wanna cast each of these fields as a {{ dbt.type_int() }} or {{ dbt.type_numeric() }} or float or whatever is most appropriate for the field

raphaelvarieras commented 1 month ago

I already got some answers from Fivetran support. Apparently this is because the specifications provided by Twilio for their API is that these are provided as strings. So it seems maybe there are circumstances under which the API might return non-numeric answers?? I'm just not sure that is enough of a reason to compromise the ease with which to perform analysis using these models. I imagine many will need to add a casting operator in their project if it's not part of this package.

fivetran-jamie commented 1 month ago

Yeah I do see that Twiliio lists these fields as strings in the API Docs, but I completely agree with you that the fields are really only valuable as actual numerics. Going to investigate some more and will report back 🫡

fivetran-jamie commented 1 month ago

@raphaelvarieras We're currently trying to get in contact with Twilio to get some clarification on these fields, but it could be helpful for you to open a case with Twilio directly as well, as they might prioritize getting to customer questions over partner ones.

fivetran-jamie commented 1 month ago

So finally got some answers (somewhat) from Twilio:

GENERAL ADVICE ON TREATING STRING FIELDS AS NUMBERS When dealing with API responses, it's common to encounter fields that are represented as strings but contain numerical data. This often happens for several reasons, including but not limited to:

Ensuring precision for very large numbers that might exceed the limits of numerical types in some programming languages.

Maintaining compatibility across different systems where the interpretation of numerical types might vary.

With this, I think the package can safely apply the below updates in the relevant staging models:

  1. Scrub out non-numeric characters (it doesn't sound like there will typically be any but just in case)
  2. Cast each of the fields as {{ dbt.type_numeric() }}

@raphaelvarieras We'll plan ot take this on next sprint, which starts 6/13, but if you have availability to open a PR we will happily review and merge before then!

fivetran-reneeli commented 6 days ago

Thanks @fivetran-jamie and @raphaelvarieras for the context and scoping, will start working on an update to the package