datamill-co / target-redshift

A Singer.io Target for Redshift
MIT License
23 stars 17 forks source link

Default column length #9

Closed mirelagrigoras closed 5 years ago

mirelagrigoras commented 5 years ago

When working with a large number of columns, setting the column length as the maximum varchar length( 65,535 characters) could cause an error: target_postgres.postgres.PostgresError: ('Exception writing records', QueryCanceledError('Intermediate result row exceeds database block size\n',)) This is why I suggest decreasing the default (also, this would decrease the table size, so, the cost of our Redshift cluster). Yes, this is still a limitation, but sometimes is better to start by using a smaller column length and increase it afterwards(if needed) than the opposite.

AlexanderMann commented 5 years ago

I like this approach since values would get rejected still providing feedback to users.

@mirelagrigoras what would you think about providing a default from the configuration and using that? ie, a user could specify default_column_width.

@awm33 thoughts here?

mirelagrigoras commented 5 years ago

To me, this seems to be a great idea.This way, the target could be customized by the user according to their needs.

În Sâm, 9 feb. 2019, 01:26 Alexander Mann <notifications@github.com a scris:

I like this approach since values would get rejected still providing feedback to users.

@mirelagrigoras https://github.com/mirelagrigoras what would you think about providing a default from the configuration and using that? ie, a user could specify default_column_width.

@awm33 https://github.com/awm33 thoughts here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datamill-co/target-redshift/pull/9#issuecomment-461981646, or mute the thread https://github.com/notifications/unsubscribe-auth/AtMRVGeedfRbEPzpzOHpfFQh6E_jSTPgks5vLgesgaJpZM4avkpL .

mirelagrigoras commented 5 years ago

I like this approach since values would get rejected still providing feedback to users.

@mirelagrigoras what would you think about providing a default from the configuration and using that? ie, a user could specify default_column_width.

@awm33 thoughts here?

I added the default length in the configuration file. Let me know what you think @awm33 @AlexanderMann
Thanks!

awm33 commented 5 years ago

@mirelagrigoras This great! Thank you for your contribution! I just noticed a one line issue, if you could take a look

awm33 commented 5 years ago

@mirelagrigoras Great! Approved! We'll get this merged and deployed soon

mirelagrigoras commented 5 years ago

@awm33 Actually, let me explain what was the logic when I set the value to self.MAX_LENGTH, because I think this is the correct way to handle the situation. So, from what I understand, in the schema there might be a key "maxLength". If there is, max_length will be sent to that value, otherwise, it will be set to self.default_column_length. However, if the key exists in the dictionary and its value is greater than the default, then we should set max_length to self.MAX_VARCHAR since setting it to a lower value could cause the import of the data into the table to fail (resulting into a String length exceeds DDL length error). I migth be wrong, so please, let me know if I misunderstood something.

awm33 commented 5 years ago

@mirelagrigoras Ah, I see, so if maxLength is set, it overrides the default. If that's the intent, then I think I would have expected:

max_length = schema.get('maxLength', self.default_column_length)
if max_length > self.MAX_VARCHAR:
    max_length = self.MAX_VARCHAR

This way, there is a default, and it can be overridden by maxLength in the schema, but we don't necessarily set it to MAX_VARCHAR when it's greater than the default. For example if the default was 2000, and maxLength was 2500, we would set it to 64k.

mirelagrigoras commented 5 years ago

@awm33 That is a great catch.This should be the expected behaviour.I have changed the condition.Thank you for your time!

În Dum, 10 feb. 2019, 00:25 Andrew Madonna <notifications@github.com a scris:

@mirelagrigoras https://github.com/mirelagrigoras Ah, I see, so if maxLength is set, it overrides the default. If that's the intent, then I think I would have expected:

max_length = schema.get('maxLength', self.default_column_length) if max_length > self.MAX_VARCHAR: max_length = self.MAX_VARCHAR

This way, there is a default, and it can be overridden by maxLength in the schema, but we don't necessarily set it to MAX_VARCHAR when it's greater than the default. For example if the default was 2000, and maxLength was 2500, we would set it to 64k.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datamill-co/target-redshift/pull/9#issuecomment-462084900, or mute the thread https://github.com/notifications/unsubscribe-auth/AtMRVOELsZe1OhDzb8Gyla90DEPaVLy-ks5vL0q9gaJpZM4avkpL .

AlexanderMann commented 5 years ago

@mirelagrigoras looks good. I'm going to go ahead and merge this. At the beginning of next week, if there are no other changes, I'll get a release of this ready to go!

AlexanderMann commented 5 years ago

@mirelagrigoras version 0.0.3 is now live!

mirelagrigoras commented 5 years ago

Thank you both so much!

În Lun, 18 feb. 2019, 17:32 Alexander Mann <notifications@github.com a scris:

@mirelagrigoras https://github.com/mirelagrigoras version 0.0.3 is now live! https://pypi.org/project/target-redshift/0.0.3/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datamill-co/target-redshift/pull/9#issuecomment-464776478, or mute the thread https://github.com/notifications/unsubscribe-auth/AtMRVFaCx5E-BqC5T_U7CrpV0z5CeG4zks5vOsd7gaJpZM4avkpL .