Closed yiwan-rl closed 3 years ago
You're right. A similar alternative is to check for convergence over a window of N (potentially equal to |S|) updates.
How do we feel about the approach in our evaluation experiment program?
For sync algorithm, it checks if the mean-absolute update to the values across all states is less than a tolerance.
For async, it performs this check over the last |S| updates.
The current approach looks good. Perhaps later we might modify the search control procedure in the async case to pick states arbitrarily (based on the discussion in #16). Then we might average the the last N updates, where N is a large number. Hmm, I guess N could still be equal to |S|. Thoughts?
@abhisheknaik96 Since Rich hates convergence checks in general, I think this will never be part of the code that is holding us up. When we start doing async updates in a random order we can relax this condition to ensure we are not stopping too early.
Closing for now.
The current way to determine convergence is to see if one state's update is below some threshold. The async algorithm may have the magnitude of one state's update being below the threshold while the convergence is still not achieved -- other states' values may still be changed.
We need to change this condition for async algorithms. For example, we can say there is a convergence when the updates to all states in the previous sweep are below the threshold.