Fix and improve hash token distribution algorithm

There are 2^32 distinct "hash tokens" in our hash space, but we were using INT32_MAX (2^32 - 1) in the code instead. Because of this, shard counts which one might expect to divide evenly into the space (such as 16, 32, or 256) had fewer tokens than they should have. The remainder of the tokens were stuffed into the last shard, causing uneven load.

Though fixing the INT32_MAX bug solves the above case, it still doesn't deal with the remainder, which can be as large as shardCount - 1. We could continue stuffing it into the top shard, but I find it nicer to have all shard sizes be within one token of one another.

We previously divided the shard count into the hash token count to get a "hash token increment" and added that increment each iteration: this gives something like shardIndex * (hashCount / shardCount). By changing the grouping to (shardIndex * hashCount) / shardCount, the issue with distributing the remainder goes away entirely and we get "nice" shards.

citusdata / pg_shard

Fix and improve hash token distribution algorithm #145