It makes sense to implement a heuristic for determining: How much processes to start (bookkeeping and data copying has computation cost) and also how much computation each one should perform.
Following three problems:
Choose a batch_size converted to computation_budget or something similar.
Speed changes within permutation sampling as well as the effective dataset gets smaller.
For some subsets utility evaluations take longer than for others.
Ideas:
Use N processes to estimate T(cb^(t)+d)with d ~ N(0, s=0.1), perform gradient descent to obtain cb^(t+1) next estimate.
Implement a policy via PPO or Contextual Bandits. Later assumes that the position is not part of the state, e.g. the action cb^(t) is an absolute value and not the delta to the next value (which would be probably more stationary).
It makes sense to implement a heuristic for determining: How much processes to start (bookkeeping and data copying has computation cost) and also how much computation each one should perform.
Following three problems:
batch_size
converted tocomputation_budget
or something similar.Ideas:
N
processes to estimateT(cb^(t)+d)
withd ~ N(0, s=0.1)
, perform gradient descent to obtaincb^(t+1)
next estimate.cb^(t)
is an absolute value and not the delta to the next value (which would be probably more stationary).