Open joeschmid opened 5 years ago
@joeschmid thanks for the kind words! We're always looking to make Target-Redshift better, so we really appreciate questions like this.
There is currently no supported way to do what you're asking. There have been conversations in the past about building up tooling to detect data widths so that we can leverage tighter constraints inside Redshift and avoid penalties for things like TEXT
columns everywhere, instead of VARCHAR(20)
, etc.
There is some work coming down the pipe which will make a number of these improvements simpler in the future, but what the "future" here means is pretty up in the air.
Given this, I don't think the most expedient way for you to resolve your is to wait out for this feature.
I'd be happy to help walk you through what changes I would expect you'd need to make to get things working if that's useful to you?
@AlexanderMann thanks very much for the update and explanation. That all makes sense. If you wouldn't mind walking through the changes to get this scenario working I'd appreciated it. (And maybe any others who come across similar issues would see the explanation here and it would help them out.)
@joeschmid no problem. So I will start by saying that the way to "get this working" is to fork this repo, and start trying to get what you're after working. I'm also not sure if it'll "work" or end up being a 🐰 🕳
Worth noting, Stitch also doesn't "support" this: https://www.stitchdata.com/docs/destinations/redshift/#data-limits
Integer range 9223372036854775808 to 9223372036854775807 Integer values outside of this range will be rejected and logged in the _sdc_rejected table.
Make all integers
NUMERIC(0, 20)
Prolly be straightforward and simple.
Column widths will balloon for all integers. Redshift (last I checked) uses the full width for a column for all values in the column, whereas PostgreSQL uses the width of the data in the row to consume memory.
In these lines, you're just going to make a mapping for JSONSchema's integer type to Redshift's NUMERIC(0,20)
: https://github.com/datamill-co/target-redshift/blob/master/target_redshift/redshift.py#L97-L118
For more examples of what that'd look like, check in here: https://github.com/datamill-co/target-postgres/blob/master/target_postgres/postgres.py#L806-L870
@joeschmid I'm not sure if you resolved this, but a hack (and for anyone looking this issue) would be to create a view where that column is a text/string type then use a SQL transform to parse that into a custom numeric type after replication.
Thanks for the work on this project! We're just trying out Singer for moving data from MySQL to Redshift. In MySQL we have a column type of
bigint(18) unsigned
. Some values in this column don't fit it Redshift'sbigint
column type and we get errors likeOverflow (Long valid range -9223372036854775808 to 9223372036854775807)
Typically we declare a Redshift column as
NUMERIC(20, 0)
to hold these values. Is there a way to telltarget-redshift
to use that type for a particular Redshift column?