Remove synthetic bucket partitioning

nj1973 commented 11 months ago

We have support for a synthetic bucket partition column on a Hadoop backend to support parallel reading of data through a connector. As the connector no longer exists we no longer need to support this synthetic partition column.

We need to be careful removing the hash column hidden option/metadata though, I believe it is used for other functionality than just synthetic bucketing. Need to check.

While doing this also remove any num location files option/defaults/configuration and code references. Location files and number of buckets are intertwined in the code base and would be best removed in one activity.

abb9979 commented 8 months ago

Includes the following variables:

DEFAULT_BUCKETS
DEFAULT_BUCKETS_MAX
DEFAULT_BUCKETS_THRESHOLD
NUM_LOCATION_FILES

nj1973 commented 7 months ago

It looks like num_buckets_threshold is used by Synapse too. Perhaps the bucket_hash_col option and DEFAULT_BUCKETS_THRESHOLD should be renamed as part of this change.

gluent / goe

Remove synthetic bucket partitioning #20