Closed bhairavmehta95 closed 6 years ago
I would rather we not have the map generator output a graph of tiles, because that would make it much harder to hand-edit map files. I also tend to prefer to design file formats in a way that avoids storing redundant information. In this case, the connectivity graph is redundant because the layout and types of the tiles is sufficient to deduce it.
In terms of planning, I think that the first step is probably to generate the control points for all the curves for each tile. These points should ideally be pre-translated and rotated to minimize the amount of work that has to be done at each step()
by the simulator. We want to write numpy code to efficiently find the closest curve, and the closest point on that curve, eg: find the closest point to all possible curves in parallel.
Computing the reward seems somewhat non-trivial with multiple curves. Right now, there is only one curve. If you're in the wrong lane, you get a bad reward because you're far from the curve. You also get penalized if your heading doesn't align with the tangent of the curve. If we pick the closest curve to the agent's current position, then if you go too far into the wrong lane, you will be closest to the left lane curve. I suppose filtering the curves based on the heading might eliminate that problem, but there may be weird corner cases, such as what happens if you are nearly perpendicular to the curves. All of this being said, maybe your idea of simply taking the max will work fine in practice. It's too bad we don't have more solid RL code to test that our reward functions are actually learnable.
Sidenote: we could also possibly consider ditching the curves? We could design a reward function that tracks which lane the agent is in, and encourages the bot to go as fast as possible, to minimize turning/wobbliness, and to avoid crossing into the other lane while on tiles that have lanes. Though I suppose the issue with that approach is that curved tiles are... curved, and so you need some kind of curve to separate the lanes.
Back to the planning question. I think we need to address the fundamental issues first: curves, reward function for lane following, etc. NPCs should wait until another PR, because that will come with its own set of challenges, such as how to get multiple NPC robots to negotiate an intersection (it's possible for them to get into a deadlock or to get all queued up in the same lane). The navigation talk itself should probably wait too.
In terms of planning, I think that the first step is probably to generate the control points for all the curves for each tile. These points should ideally be pre-translated and rotated to minimize the amount of work that has to be done at each step() by the simulator. We want to write numpy code to efficiently find the closest curve, and the closest point on that curve, eg: find the closest point to all possible curves in parallel.
I will get started!
I sketched a DuckietownNav
class for the navigation task: https://github.com/duckietown/gym-duckietown/commit/1174b87d820c97020f45ad2879e92514055d241f
Have we confirmed that we're doing goal tile vs goal pos? Or is this mostly a placeholder?
Goal tile is what I was told it would be in conversations on Slack, but we can easily change this if needed.
We've finished most of the stuff needed for AIDO, so no new major features will be added until the end of the first competition.
As discussed, we need to make changes to the simulator such that both
are supported, which means that we will have to restructure / build things like (Grouped by topic):
Bezier Curves
Need to generate Multiple Bezier Curves
Why? These will serve as rewards for the agent, paths for the NPC = Non Controllable Duckies driving around during the Navigation w. Dynamic Obstacles task) How? We can define the control points (each curve needs 4) for each curve on each tile. This means that we will need to manually define:
12 possible curves 4 points per curves = 48 points per 4way intersection tile 6 possible curves 4pts ea. = 24 points per 3way intersection tile 2 possible curves * 4pts ea. = 8 points per straight / curve tile.
We only need to manually define these once and store them, and then rotate and translate the curves per orientation / translation of the tile itself.
It might be helpful to store these in bins, based on which direction you enter the tile from.
Picking which of those curves is actually the one we're "following"
Why? Whether it be for the NPC (to find out which waypoints to drive to, waypoints being the four points of the curve that we choose the NPC to drive on, more on that below) or the agent (needed to calculate the reward).
How? Taking the most complicated tile - the four way intersection, we have twelve possible curves to follow on any given tile. Using the heading vector of the object (agent/NPC), we can rule out 9 of them by choosing the "bin" of curves (if we're on a 4way intersection or 3way intersection tile) that aligns most with the heading, or (if we're on a curve / straight), using some other sort of logic.
To compute a reward for an agent: now that we have the candidate curves - 3 or less - we compute the same reward based on each Bezier curve (as we currently do), and give the agent the max of them.
To pick a path for a NPC: randomly select a curve from the candidate curves, and use a controller (DD/something else) to follow the waypoints associated with that curve.
Dynamic Obstacles
If we have a method for picking the right candidate curves, we can randomly select a curve to follow at each tile for each NPC, and use the objects position as a proxy of when to switch following one curve on tile A to another on tile B.
Connecting the Paths
We need to make sure we can transition from one tile to another efficiently, so it might be nice to have the Map Generator output a directed graph of the connected tiles (i.e where valid transitions are allowed). This might be useful if we decide to give turn by turn directions along with the navigation task, or if people decide to do that themselves.