Open Malnammi opened 5 years ago
@Malnammi I have a question about the exploitation weight. The weight increases as the cluster coverage increases. At some point, wouldn't we want there to be diminishing returns for an active cluster?
@agitter my idea for the exploitation weight was:
During the computation of exploitation weights for the clusters, if the cluster has no highly active predictions (exceeding the threshold), then its default Activity_i will be zero. In other words, it will be completely weighted by its coverage (i.e. W_i_exploit <= 0.5). It will be outranked by any cluster with Activity_i > 0.
This also begs another issue: what do we do if all the clusters have Activity_i = 0? Do we want to weigh based on Coverage_i alone? Or stop exploiting and focus more on exploration till our model becomes more confident?
We discussed that activity predictions ranges are model dependent; i.e. small datasets typically give low range of predictions [0,0.4] for random forest. In the current implementation we have a temporary remedy for this where we set the parameter for thresholding using a quantile rather than an absolute. Specifically, using a quantile of 0.5, then the threshold for highly active unlabeled molecules are those >= median of unlabeled prediction.
I see, so the coverage is used to estimate confidence, not diminishing returns.
what do we do if all the clusters have Activity_i = 0?
My initial thought is that it would make sense to focus on exploration, as you suggested.
For the activity prediction ranges, this temperature scaling method is the one Jay tested: https://arxiv.org/pdf/1706.04599.pdf I'm not certain that it is relevant for us.
The current strategy assigns exploitation and exploration weights to clusters in the following manner:
favors clusters with high activity and high density of labeled data.
favors clusters with low coverage and high uncertainty. We also have the option of selecting exploration clusters randomly or a set of dissimilar clusters.
The current code for this method is here: link Hyperparameter configs are here: link
Here are some pending issues with this: