Closed marcofavoritobi closed 2 years ago
~TODO: after all samplers have been updated according to the new BaseSampler
class, update seed getter/setter with Optional[int]
~ done
Halton sampler "visual proof" of functional equivalence (i.e. low-discrepancy).
Script to reproduce the plots:
import time
import numpy as np
from black_it.samplers.halton import HaltonSampler
from black_it.search_space import SearchSpace
import matplotlib.pyplot as plt
from black_it.samplers.r_sequence import RSequenceSampler
from black_it.search_space import SearchSpace
N = 10000
sampler = HaltonSampler(batch_size=8, random_state=0)
param_grid = SearchSpace(
parameters_bounds=np.array([[0, 1], [0, 1]]).T,
parameters_precision=np.array([0.00001, 0.00001]),
verbose=False,
)
new_params = sampler.sample_batch(N, param_grid, np.zeros((0, 2)), np.zeros((0, 2)))
plt.scatter(*zip(*new_params), s=0.1)
plt.show()
Ok great, the two figures do indeed look alike, while before you fixed that bug the new version was showing some regular stripes patterns
Edit: forget it, I mistook the figures as R-samples, while they refer to the Halton sampler which was working even before
Before 44109a1:
After 44109a1:
Looks wrong...
Code snippet:
import time
import numpy as np
from black_it.samplers.halton import HaltonSampler
from black_it.search_space import SearchSpace
import matplotlib.pyplot as plt
from black_it.samplers.r_sequence import RSequenceSampler
from black_it.search_space import SearchSpace
N = 10000
sampler = RSequenceSampler(batch_size=8, random_state=0)
param_grid = SearchSpace(
parameters_bounds=np.array([[0, 1], [0, 1]]).T,
parameters_precision=np.array([0.00001, 0.00001]),
verbose=False,
)
new_params = sampler.sample_batch(N, param_grid, np.zeros((0, 2)), np.zeros((0, 2)))
plt.scatter(*zip(*new_params), s=0.1)
plt.show()
Before:
After:
The error was that according to the definition, only the last phi of the vector must be used.
@AldoGl I brought in all the relevant changes from #2, and fixed the bug of the R-sequence sampler. Is there something else missing to this PR?
BestBatchSampler could be improved as some operations (the shock of the parameters) are not vectorized. I would leave it as future work, since it is good enough as it is now.
If @muxator has some time, would be useful to have his opinion as well.
Thanks Marco, great and very timely work.
The only improvement I can think of is to remove the call digitize_data(sampled_points, search_space.param_grid)
from the return statement of all sample_batch
methods, and plug it as a discretisation step inside the sample
method of the BaseSampler
. For instance, in line 116 of the base.py module we might have a line with samples = digitize_data(samples, search_space.param_grid)
.
This would simplify the creation of a new sampler as the user could rely on the base class for discretisation and would not need to think about it.
In case we agree, @muxator might gave a smart way of applying this change to all past commits, as an alternative we might apply this improvement on top of all commits (probably this is the best choice)
Thanks Marco, great and very timely work.
The only improvement I can think of is to remove the call
digitize_data(sampled_points, search_space.param_grid)
from the return statement of allsample_batch
methods, and plug it as a discretisation step inside thesample
method of theBaseSampler
. For instance, in line 116 of the base.py module we might have a line withsamples = digitize_data(samples, search_space.param_grid)
. This would simplify the creation of a new sampler as the user could rely on the base class for discretisation and would not need to think about it.In case we agree, @muxator might gave a smart way of applying this change to all past commits, as an alternative we might apply this improvement on top of all commits (probably this is the best choice)
Thank you for the suggestion. I have some thoughts on that. As a counterargument for removing digitized_data
from sample_batch
is that such method should return a batch of parameters of the desired search space; this is somewhat an implicit requirement due to the parameter search_grid of the abstract method sample_batch
. A good addition is to add this requirement in the docstring (and in this case, adding a check in sample seems too much and not desirable for performance purposes). Another point is that, if some samplers already return data that are digitized, e.g. this is the case for the random uniform sampler, then the further digitization in 'sample' is redundant; we could first add a check whether the params returned by sample_batch are digitized or not, but it is still an overhead.
Ok I am convinced by what you say. I think we can merge the PR as it is is also @muxator agrees
LGTM.
The Ci failure is due to a transient error in npm. Subsequent runs of the failed job succeeded, I do not know why they are not showing here.
tests finally passing, merging now!
Proposed changes
Remove
single_sample
as abstract method, and instead makesample_batch
the core abstract method to implement.Fixes
n/a
Types of changes
What types of changes does your code introduce? Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.develop
branch (left side). Also you should start your branch off ourdevelop
.Further comments
n/a