gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Remove synthetic bucket partitioning #20

Closed nj1973 closed 5 months ago

nj1973 commented 11 months ago

We have support for a synthetic bucket partition column on a Hadoop backend to support parallel reading of data through a connector. As the connector no longer exists we no longer need to support this synthetic partition column.

We need to be careful removing the hash column hidden option/metadata though, I believe it is used for other functionality than just synthetic bucketing. Need to check.

While doing this also remove any num location files option/defaults/configuration and code references. Location files and number of buckets are intertwined in the code base and would be best removed in one activity.

abb9979 commented 8 months ago

Includes the following variables:

nj1973 commented 7 months ago

It looks like num_buckets_threshold is used by Synapse too. Perhaps the bucket_hash_col option and DEFAULT_BUCKETS_THRESHOLD should be renamed as part of this change.