Generic-Search - Githubissues

norvig commented 7 years ago

After a suggestion by Rob Holte, I'm considering changes to the generic search pseudocode at https://github.com/aimacode/aima-pseudocode/blob/master/md/Tree-Search-and-Graph-Search.md

I'm trying to sort through the implementation choices for OPEN and CLOSED (which we currently call "frontier" and "explored").

We need to support three operations: (1) check if a state has been encountered before in the interior of the search tree. (2) check if a state is on the frontier, and if so, fetch the cost and path to it. (3) pop the best path from the frontier.

Implementation Choice 1:

explored is a set of states that have been expanded
frontier is a priority queue of paths, and also a hashtable of {state: path} mappings.
code: if x not in explored and (x not in frontier or cost(x) < cost(frontier[x]): frontier.add(x)

Note: "path" and "node" are synonyms.

Implementation Choice 2:

reached is a hash table of {state: path}
frontier is a priority queue of paths
code: if x not in reached or cost(x) < cost(reached[x]): frontier.add(x)

It looks like Choice 2 is simpler. Is reached the best name? Maybe best_path?

What do you think @ctjoreilly @MrDupin @redblobgames ?

redblobgames commented 7 years ago

explored is a set, either represented explicitly with a set data type or implicitly as the set of keys of a table/map. You need the explicit set if there isn't any table, but since we almost always have a table of some sort (parent pointers/paths or costs) the implementation is simpler if we reuse that instead of maintaining a separate set.

However I don't have a good sense of what's easier for your students to understand.

What I do in my code is an (implicit) visited set instead of the explored set. The visited set is explored ∪ frontier. That way I don't need to store paths/costs/etc. in both bestpath and frontier. I store it in one place. I don't know if this generalizes across all the algorithms you want to support.

norvig commented 7 years ago

I think @redblobgames is right -- that a single explored ∪ frontier is a good idea. Holte suggests calling this reached which I think is a good name (as is visited).

antmarakis commented 7 years ago

Before I go on, I think I am misunderstanding something about the pseudocode. Shouldn't there be a check for the path (child/c) containing a goal state when updating solution? In 3e such a check is made, but in 4e I don't see a check. Maybe it is done implicitly and I didn't catch it? I was assuming the code would be something like:

if c contains a goal state and cost(c) < cost(solution) then
    solution = c

I would go for Choice 1 if the check wasn't so verbose. It is closer to the traditional OPEN/CLOSED sets that get taught in classrooms (so it will at the very least be more familiar to students).

I agree that the single reached/visited set seems better.

norvig commented 7 years ago

@MrDupin is right; I had a copy/paste error and got the goal test wrong. Fixing now,

antmarakis commented 7 years ago

I am moving my comments from here so that other users can read/comment.

Initial Comment

I prefer the new version.

From my understanding, there is some added cost in computation (since the cost of a path needs to be calculated each time, in cost(c) and cost(solution) - although I can see that we can store the cost to each path without much hassle), and it also seems that there is an added memory cost (storing paths instead of nodes).

Despite that, I think this is a more "complete" algorithm, since we not only get the cost of the solution, but the path itself. In most search algorithms, to find the final path there needs to be employed some sort of auxiliary functionality (usually setting the parent of each node and then backtracking from the final node). Here we return the path, and the cost can be calculated later using the cost function. This is neat.

Also, I feel like this approach is closer to what search in general entails. We want to search for the best path in a set of possible paths, and this approach does that in a cleaner way. Previously we were basically expanding nodes and keeping score of the cost-so-far, which is conceptually a bit further away from the problem at hand.

Regarding Python, this will be easy to implement + we get to print the path to the goal state instead of just its cost, without the use of external functions. This, I feel, will be beneficial to students.

Finally, and this is just a personal anecdote, when I was studying search algorithms I always wondered why didn't we store paths instead of nodes, since it seemed easier.

All in all, I am for this change.

In reply to this comment

About the "what is c" thing, I assumed it was actually both a state and a path, and that it is written a bit "fluidly". The successors(parent) method extends the path in all possible ways. That means, it adds a state to the path whenever adding a state is possible. What it returns is both the extended path and the state.

If that is indeed the case, maybe it needs to be written more clearly? Maybe something like for path, state in successors(parent) or assume c is an object and c.path and c.state its properties. That would clear things up. Re-reading the pseudocode I also got a bit confused as to what c is.

aimacode / aima-pseudocode

Generic-Search #59

Initial Comment

In reply to this comment