Closed Armandpl closed 1 year ago
trajectories depend on car speed which the nn can't figure out from a single image. We could add a GRU or feed two frames to the model. A less costly approach (in terms of compute) would be to feed the speed at t-1 in sim and the measured speed in real life to the last FC layer of the network
We could then viz predicted trajectories for a range of speeds and compare losses with and without the speed info
put model back on cpu after training, see if it fixes tensor rt not knowing about mps
Act as an excellent engineer, the type that can write haskell and cuda kernels but also Python, the type that manages to write clear, readable code and communicate about it. Also never forget I believe in you <3.
I need you to write me a torch.util.data.Dataset class to load my custom dataset.
root_dir/images
, they are number from 0000.png to 10000.png when len(dataset) == 10_000.root_dir/rl_trajectories.txt
which is a numpy array of shape (10000, 7). Each row is (car.pos_x, car.pos_y, car.yaw, steering_command, speed_command, end_of_sequence). The car position is stored in a global framePlease ask any clarifying question before generating the code
I need you to write code to project a trajectory in 3d space (a set of points) relative to a camera onto the image. Use the following function to project the points:
def project_points(point_3d: torch.Tensor, camera_matrix: torch.Tensor) -> torch.Tensor:
r"""Project a 3d point onto the 2d camera plane.
Args:
point3d: tensor containing the 3d points to be projected
to the camera plane. The shape of the tensor can be :math:`(*, 3)`.
camera_matrix: tensor containing the intrinsics camera
matrix. The tensor shape must be :math:`(*, 3, 3)`.
Returns:
tensor of (u, v) cam coordinates with shape :math:`(*, 2)`.
Example:
>>> _ = torch.manual_seed(0)
>>> X = torch.rand(1, 3)
>>> K = torch.eye(3)[None]
>>> project_points(X, K)
tensor([[5.6088, 8.6827]])
"""
# projection eq. [u, v, w]' = K * [x y z 1]'
# u = fx * X / Z + cx
# v = fy * Y / Z + cy
# project back using depth dividing in a safe way
xy_coords: torch.Tensor = convert_points_from_homogeneous(point_3d)
return denormalize_points_with_intrinsics(xy_coords, camera_matrix)
Please ask any clarifying question then finish writing the following code:
# plot trajectory on the images
cam_offset = (0.105, 0, 0.170) # x, y, z in meter from center of mass. x is forward
cam_rotation = (0, 12, 0) # roll, pitch, yaw in deg
cam_focal = 0.87 # mm
sample = ds[0]
img = ds[0]["image"] # torch tensor
traj = ds[0]["trajectory"] # (n, 2) x, y coordinates relative to car center of mass
# plot the trajectory on the image
# 1. transform the points from the car frame to the camera frame using cam_offset and cam_rotation
# WRITE CODE here
# 2. project the 3d points onto the image and plot them
# WRITE CODE here
would be nice to do a sweep to benchmark inference speed of different models, to see what we could use beyond resnet18 looks like resnet18 is still a good choice, though:
Ok so traj prediction in sim seems alright but seems bad on real images. I think one issue is I didn't sample enough "recovery trajectories", trajectories that go from a bad state to the optimal trajectory. One reason for that is that I terminate the episode if the car has even one wheel outside the track, making it impossible to recover from harder cases as it would require lightly crossing the lines. However, I can't allow wheels outside the track as is because sometimes there are obstacles outside the track.
Act as an excellent engineer and never forget I believe in you.
I am writing a Gym wrapper for my Reinforcement Learning env. Here is a draft of the code:
class RescaleWrapper(gym.Wrapper):
"""Rescale observation and action space between -1 and 1"""
def __init__(self, env: gym.Env):
super().__init__(env)
self.observation_space = Box(low=np.zeros_like(self.env.observation_space.low) - 1, high=np.zeros_like(self.env.observation_space.low) + 1)
self.action_space = Box(low=np.zeros_like(self.env.action_space.low) - 1, high=np.zeros_like(self.env.action_space.low) + 1)
def step(self, action):
# Clip and rescale action, using self.env.action_space.low/high
# CODE HERE
obs, reward, terminated, truncated, info = self.env.step(action)
# Clip and rescale observation, using self.env.observation_space.how/high
# CODE HERE
return obs, reward, terminated, truncated, info
I want it to clip and rescale the observations and actions. e.g if obs = [1, 15] and min obs = [0, 0] max obs = [1, 10], obs should be rescaled to [1, 1]. Please ask any clarifying questions and finish writing the code (replace # CODE HERE)