The pseudocode for Online DFS Agent contains line else a ← an action b such that result[s', b] = POP(unbacktracked[s']). I see that it is not consistent with python implementation here.
else:
# else a <- an action b such that result[s', b] = POP(unbacktracked[s'])
unbacktracked_pop = self.unbacktracked.pop(s1)
for (s, b) in self.result.keys():
if self.result[(s, b)] == unbacktracked_pop:
self.a = b
break
which implies we are getting an action where result[s,b](result of s instead of s') =POP(unbacktracked[s'])
The pseudocode for Online DFS Agent contains line
else a ← an action b such that result[s', b] = POP(unbacktracked[s'])
. I see that it is not consistent with python implementation here.which implies we are getting an action where result[s,b](result of s instead of s') =POP(unbacktracked[s'])