Closed dnguyen1196 closed 4 years ago
We discussed this on slack but I'll just add this here for posterity. Thanks for raising the issue. My alternative proposal is to set unlabeled data to np.nan.
import numpy as np
ys = torch.Tensor([0., 0.3, 0.5, np.nan])
k = 2
torch.topk(ys[~torch.isnan(ys)], k)
Doing this would allow you to use a conventional batch function etc. as for a normal tensor.
But actually, I'm not sure this is actually as big of a problem as you might think.
See the semisupervised BO experiment object I created - as constructed, self.old_data is always labeled because we reveal the data to ourselves from self.new to self.old.
@miretchin
In semi-supervised experiment, some of the
y
labels will be None.torch.tensor(ys)
will return error if the array contains None. Therefore, we would need to find ways to work with array containing None. Immediately,torch.max
no longer works. But we can always replace it withmax([y for y in ys if y is not None])
for now.In
plotting_active.py
And when collecting the results at the end
One point of concern is that the current code uses
results_data[:len(x)] = np.maximum.accumulate(ys[x].cpu().squeeze())
. Therefore, it is unclear how usingresults_data[:len(y_not_none)]
(which might be fewer elements) will break downstream data visualization code?In
experiment.py
The blind pick needs to re pick if we pick an unlabelled data pointIn
experiment.py/__init__
Also the function to slice tensor