abhisheknaik96 / differential-value-iteration

Experiments in creating the ultimate average-reward planning algorithm
Apache License 2.0
0 stars 2 forks source link

The condition of convergence needs to be revised for async algorithms. #11

Closed yiwan-rl closed 3 years ago

yiwan-rl commented 3 years ago

The current way to determine convergence is to see if one state's update is below some threshold. The async algorithm may have the magnitude of one state's update being below the threshold while the convergence is still not achieved -- other states' values may still be changed.

We need to change this condition for async algorithms. For example, we can say there is a convergence when the updates to all states in the previous sweep are below the threshold.

abhisheknaik96 commented 3 years ago

You're right. A similar alternative is to check for convergence over a window of N (potentially equal to |S|) updates.

btanner commented 3 years ago

How do we feel about the approach in our evaluation experiment program?

For sync algorithm, it checks if the mean-absolute update to the values across all states is less than a tolerance.

For async, it performs this check over the last |S| updates.

abhisheknaik96 commented 3 years ago

The current approach looks good. Perhaps later we might modify the search control procedure in the async case to pick states arbitrarily (based on the discussion in #16). Then we might average the the last N updates, where N is a large number. Hmm, I guess N could still be equal to |S|. Thoughts?

btanner commented 3 years ago

@abhisheknaik96 Since Rich hates convergence checks in general, I think this will never be part of the code that is holding us up. When we start doing async updates in a random order we can relax this condition to ensure we are not stopping too early.

Closing for now.