Duplicate samples for chocolate.Bayes

Reproducible Example:

import chocolate as choco

def objective_function(alpha, l1_ratio):
    return alpha + l1_ratio

space = {
    "alpha": choco.quantized_uniform(0.1, 0.3, 0.1),
    "l1_ratio": choco.quantized_uniform(0.5, 1.0, 0.1)
}

conn = choco.DataFrameConnection()

sampler = choco.Bayes(conn, space)

samples = []
for i in range(20):
    token, params = sampler.next()
    samples.append((token, params))
    loss = objective_function(**params)
    sampler.update(token, loss)

for sample in samples:
    print(sample)

Output:

({'_chocolate_id': 0}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 1}, {'alpha': 0.2, 'l1_ratio': 0.9})
({'_chocolate_id': 2}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 3}, {'alpha': 0.2, 'l1_ratio': 0.5})
({'_chocolate_id': 4}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 5}, {'alpha': 0.2, 'l1_ratio': 0.9})
({'_chocolate_id': 6}, {'alpha': 0.2, 'l1_ratio': 0.6})
({'_chocolate_id': 7}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 8}, {'alpha': 0.1, 'l1_ratio': 0.7})
({'_chocolate_id': 9}, {'alpha': 0.1, 'l1_ratio': 0.6})
({'_chocolate_id': 10}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 11}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 12}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 13}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 14}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 15}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 16}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 17}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 18}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 19}, {'alpha': 0.1, 'l1_ratio': 0.5})

Note the repetition of id 0 and 1 and id 4 and 5, respectively.

Comments: I took a peak at the implementation and found that

During bootstrapping phase (n=10 default), there is no duplicate protection for samples that are randomly drawn.
During gaussian phase, there doesn't seem to be duplicate protection either. This is probably ok, as it would indicate convergence, but I thought I would bring it up anyway.

I can't think of a scenario where this duplication would be desirable behavior, so I am reporting this as an issue.

AIworx-Labs / chocolate

Duplicate samples for chocolate.Bayes #36