Armandpl / skyline

code I wrote to win the 2023 Renault Digital 1/10th roborace
https://twitter.com/armand_dpl/status/1670922434445291521
4 stars 1 forks source link

write robust embedded code #9

Closed Armandpl closed 11 months ago

Armandpl commented 1 year ago

The current embedded code is a single spaghetti Python script. It served us well and its time to let it go.

The ideal system has a few components:

Now I have no idea how to write this. Couple of ideas:

Exploratory tasks:

Armandpl commented 1 year ago
Armandpl commented 1 year ago
Armandpl commented 1 year ago

maybe I could export my model in the onnx format and load it as is on the jetson? in which case I might be able to avoid installing torch? if I can avoid torch maybe I can have poetry set everything up?

maybe I should have a setup.sh script to do stuff like install poetry, disable the GUI etc

Armandpl commented 1 year ago

decision: regarding dependencies I think the easiest would be to switch to a jetson orin and use poetry. might be worth looking into containers but idk I haven't found a good example of a robotic system using containers yet.

Armandpl commented 1 year ago

decision: for now let's just implement a 0mq bus

requirements: Act as an excellent embedded engineer. I am building an autonomous 1/10th rc car to compete in a race. The control software is running on a jetson nano on the car. I would like to build a bus (similar to a can bus) to split my code into modules, make it readable, easy to debug and robust (handle bugs, crashes, edge cases). Here are my requirements

Please ask any clarifying questions, let me know if anything among the requirements doesn't make sense (I'm not an expert here). Then answer the questions in the requirements, then write the code. When writing the code, explain what zmq patterns you are using (e.g Suicidal Snail Pattern, Clone pattern etc...)

Armandpl commented 1 year ago

doing some profiling, the nn takes ~21ms for inference and the pre-processing takes ~15.8ms so we can't inference at 30 fps (33ms max)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           @profile
    21                                           def main():
    22         1          0.0      0.0      0.0      camera = CSICamera(
    23         1          0.0      0.0      0.0          width=640, 
    24         1          0.0      0.0      0.0          height=360, 
    25         1          0.0      0.0      0.0          capture_width=1280,
    26         1          0.0      0.0      0.0          capture_height=720,
    27         1        925.9    925.9      2.7          capture_fps=30,
    28                                               )
    29                                           
    30         1          0.1      0.1      0.0      print("go")
    31      1000          2.0      0.0      0.0      for i in range(1000):
    32      1000      17475.2     17.5     51.0          image = camera.read()
    33      1000      15845.7     15.8     46.3          image = preprocess(image)
    34                                           
    35         1          0.1      0.1      0.0      print("done")

Total time: 15.773 s
File: bench_preprocessing.py
Function: preprocess at line 43

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    43                                           @profile
    44                                           def preprocess(image):
    45      1000         17.2      0.0      0.1      image = image[CROP_TOP:CROP_TOP+CROP_H, CROP_LEFT:CROP_LEFT+CROP_W]
    46      1000       8198.5      8.2     52.0      image = PIL.Image.fromarray(image)
    47      1000       2873.7      2.9     18.2      image = transforms.functional.resize(image, (224, 224))
    48      1000       4352.4      4.4     27.6      image = transforms.functional.to_tensor(image).cuda().half()
    49      1000        306.9      0.3      1.9      image.sub_(mean[:, None, None]).div_(std[:, None, None])
    50      1000         24.4      0.0      0.2      return image[None, ...]

Converting from np.array to pil back to tensor seems wasteful? is it doing the bgr to rgb conversion? let's save the pil image and see?

Armandpl commented 1 year ago
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           @profile
    21                                           def main():
    22         1          0.0      0.0      0.0      camera = CSICamera(
    23         1          0.0      0.0      0.0          width=640, 
    24         1          0.0      0.0      0.0          height=360, 
    25         1          0.0      0.0      0.0          capture_width=1280,
    26         1          0.0      0.0      0.0          capture_height=720,
    27         1        877.6    877.6      2.6          capture_fps=30,
    28                                               )
    29                                           
    30         1          0.1      0.1      0.0      print("go")
    31      1000          2.7      0.0      0.0      for i in range(1000):
    32      1000      22356.4     22.4     65.4          image = camera.read()
    33      1000      10963.5     11.0     32.1          image = preprocess(image)
    34                                           
    35         1          0.1      0.1      0.0      print("done")

Total time: 10.8717 s
File: bench_preprocessing.py
Function: preprocess at line 54

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    54                                           @profile
    55                                           def preprocess(image):
    56                                               # Crop
    57      1000         18.2      0.0      0.2      image = image[CROP_TOP:CROP_TOP+CROP_H, CROP_LEFT:CROP_LEFT+CROP_W]
    58                                           
    59                                               # Convert BGR to RGB
    60      1000        252.1      0.3      2.3      image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    61                                           
    62                                               # Convert to tensor
    63      1000         35.0      0.0      0.3      image = torch.from_numpy(image)
    64      1000       3019.9      3.0     27.8      image = transforms.functional.convert_image_dtype(image, torch.float32)
    65                                           
    66                                               # Move channel dimension to the beginning
    67      1000         55.0      0.1      0.5      image = image.permute(2, 0, 1)
    68                                           
    69      1000       5071.4      5.1     46.6      image = transforms.functional.resize(image, (224, 224))
    70      1000       2037.1      2.0     18.7      image = image.cuda().half()
    71                                           
    72      1000        353.8      0.4      3.3      image.sub_(mean[:, None, None]).div_(std[:, None, None])
    73                                           
    74      1000         29.2      0.0      0.3      return image[None, ...]
Armandpl commented 1 year ago

Ok so seems like I can crop and resize using nvidconv. It was already resizing the camera stream so we were in fact resizing twice.

return (f'nvarguscamerasrc sensor-id={self.capture_device} ! video/x-raw(memory:NVMM), width={self.capture_width}, height={self.capture_height}, '
        f'format=(string)NV12, framerate=(fraction){self.capture_fps}/1'
        f' ! nvvidconv top={self.CROP_TOP} bottom={self.CROP_BOTTOM} left={self.CROP_LEFT} right={self.CROP_RIGHT} ! video/x-raw, width=(int){self.width}, height=(int){self.height}, '
        'format=(string)BGRx ! videoconvert ! appsink')
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           @profile
    21                                           def main():
    22         1          0.0      0.0      0.0      camera = CSICamera(
    23         1          0.0      0.0      0.0          width=224, 
    24         1          0.0      0.0      0.0          height=224, 
    25         1          0.0      0.0      0.0          capture_width=1280,
    26         1          0.0      0.0      0.0          capture_height=720,
    27         1        869.5    869.5      2.5          capture_fps=30,
    28                                               )
    29                                           
    30         1          0.2      0.2      0.0      print("go")
    31      1000          2.2      0.0      0.0      for i in range(1000):
    32      1000      27819.8     27.8     81.4          image = camera.read()
    33      1000       5492.3      5.5     16.1          image = preprocess(image)
    34                                           
    35         1          0.1      0.1      0.0      print("done")

Total time: 5.40918 s
File: bench_preprocessing.py
Function: preprocess at line 52

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    52                                           @profile
    53                                           def preprocess(image):
    54                                               # Crop
    55                                               # image = image[CROP_TOP:CROP_TOP+CROP_H, CROP_LEFT:CROP_LEFT+CROP_W]
    56                                           
    57                                               # Convert BGR to RGB
    58      1000        120.4      0.1      2.2      image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    59                                           
    60                                               # Convert to tensor
    61      1000         37.6      0.0      0.7      image = torch.from_numpy(image)
    62      1000       2770.1      2.8     51.2      image = transforms.functional.convert_image_dtype(image, torch.float32)
    63                                           
    64                                               # Move channel dimension to the beginning
    65      1000         52.2      0.1      1.0      image = image.permute(2, 0, 1)
    66                                           
    67                                               # image = transforms.functional.resize(image, (224, 224))
    68      1000       2071.5      2.1     38.3      image = image.cuda().half()
    69                                           
    70      1000        327.8      0.3      6.1      image.sub_(mean[:, None, None]).div_(std[:, None, None])
    71                                           
    72      1000         29.6      0.0      0.5      return image[None, ...]

So now 5.5ms + 21 ms = ~37Hz which should be enough for what we do. Could maybe shave more ms out of the pre-processing by using dusty-nv/jetson-utils Or we could use a smaller network, e.g squeezenet but in sim 30Hz is enough to control the car so this should be good. I need to profile adding a GRU on top of the resnet, but I guess it should still stay doable at 30Hz, if not then squeezenet I guess

Armandpl commented 1 year ago

pre-commit doesn't run on the jetson nano, idk why maybe python 3.6. need to find a way to lint, sort import etc