Run on Jetson Xavier - Githubissues

diennv commented 4 years ago

Hi, I tested Lanenet on Jetson Xavier board and achieve FPS around 3.5.

Is it reasonable ? You can reference my logs captured from my terminal console:

name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377 pciBusID: 0000:00:00.0 totalMemory: 31.18GiB freeMemory: 22.43GiB 2020-05-07 11:18:14.521921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-05-07 11:18:15.641323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-07 11:18:15.641513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-05-07 11:18:15.641622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-05-07 11:18:15.641956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 25541 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) W0507 11:18:15.644113 20471 deprecation.py:323] From /home/stpc-xavier02/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. I0507 11:18:15.647047 20471 saver.py:1270] Restoring parameters from ./model/tusimple_lanenet_vgg/tusimple_lanenet_vgg.ckpt I0507 11:18:27.435727 20471 main.py:90] Single imgae inference cost time: 10.56070s I0507 11:18:27.436313 20471 main.py:91] FPS: 0.09469s Gtk-Message: 11:18:27.516: Failed to load module "canberra-gtk-module" I0507 11:18:27.836342 20471 main.py:90] Single imgae inference cost time: 0.28785s I0507 11:18:27.837048 20471 main.py:91] FPS: 3.47409s I0507 11:18:28.157964 20471 main.py:90] Single imgae inference cost time: 0.28836s I0507 11:18:28.158652 20471 main.py:91] FPS: 3.46794s I0507 11:18:28.475359 20471 main.py:90] Single imgae inference cost time: 0.28969s I0507 11:18:28.475979 20471 main.py:91] FPS: 3.45193s I0507 11:18:28.793320 20471 main.py:90] Single imgae inference cost time: 0.28872s I0507 11:18:28.793936 20471 main.py:91] FPS: 3.46354s I0507 11:18:29.113700 20471 main.py:90] Single imgae inference cost time: 0.28869s I0507 11:18:29.114314 20471 main.py:91] FPS: 3.46389s I0507 11:18:29.432626 20471 main.py:90] Single imgae inference cost time: 0.28792s I0507 11:18:29.433183 20471 main.py:91] FPS: 3.47319s I0507 11:18:29.745673 20471 main.py:90] Single imgae inference cost time: 0.28946s I0507 11:18:29.746284 20471 main.py:91] FPS: 3.45467s I0507 11:18:30.059857 20471 main.py:90] Single imgae inference cost time: 0.28839s I0507 11:18:30.060433 20471 main.py:91] FPS: 3.46747s I0507 11:18:30.379509 20471 main.py:90] Single imgae inference cost time: 0.28805s I0507 11:18:30.380318 20471 main.py:91] FPS: 3.47167s I0507 11:18:30.688016 20471 main.py:90] Single imgae inference cost time: 0.28903s I0507 11:18:30.688735 20471 main.py:91] FPS: 3.45982s

MaybeShewill-CV commented 4 years ago

@diennv It's reasonable:)

diennv commented 4 years ago

@MaybeShewill-CV Thank You. Have a nice day.

MaybeShewill-CV commented 4 years ago

@MaybeShewill-CV Welcome:)

ArtlyStyles commented 4 years ago

What's you command to run? It took me 16s to run single image. Maybe I should run whole directory?

diennv commented 4 years ago

I read frame from video. FPS is calculated from processed time for one frame. Reference the below code while True: ret, frame = cap.read() if not ret: break

    frame_idx += 1
    # draw
    frame = cv2.resize(frame, (512, 256), interpolation=cv2.INTER_CUBIC)
    image_vis = frame
    frame = frame/127.5 - 1.0
    t_start = time.time()
    binary_seg_image, instance_seg_image = sess.run(
        [binary_seg_ret, instance_seg_ret],
        feed_dict={input_tensor: [frame]}
    )
    t_cost = time.time() - t_start
    log.info('Single imgae inference cost time: {:.5f}s'.format(t_cost))
    log.info('FPS: {:.5f}s'.format(1/t_cost))

ArtlyStyles commented 4 years ago

I achieved about 6 FPS when I convert the model to tensorRT and ran on Xavier.

diennv commented 4 years ago

Hi Artly,

Thank you for information and congratulation about converting successfully the model to tensorRT.

I saw your github code but not clear about using method. If you dont mind, could you give me some instructions to convert the model ? Thank you so much.

ArtlyStyles commented 4 years ago

I have updated the repo. Please check. I am now getting about 9 PFS using C++.

MaybeShewill-CV commented 4 years ago

@ArtlyStyles @diennv For faster inference process you'd better switch the backbone of lanenet into some real-time semantic segmentation architecture such as BiseNetv2 https://github.com/MaybeShewill-CV/bisenetv2-tensorflow :)

ghost commented 3 years ago

@ArtlyStyles @diennv are you achieving that FPS on an AGX or an NX version of the Xavier? I have an NX at 15W power mode and If I run it on a tuSimple image on a loop it detects all lanes each frame but takes about 1 second (even with all polynomial curve fitting post-processing disabled).

Viewing jtop while it is running, I see GPU utilization alternate between 0% and 90% (it is not consistent, maybe since it's caching results for the same image?): gpu-util

And if you are achieving that FPS with an NX, what versions of tensorflow and other dependencies did you install (besides converting to tensorRT, which I haven't tried yet since procedure for doing so is unclear)?

evbo commented 3 years ago

It's hard to tell but I'm guessing "Run Time" in your image is the total elapsed time including inference and post processing? If so, then you're comparing apples to oranges since the inference is run on the GPU but the post processing involves quite a few for loops in python, which is taxing on the CPU.

That said, I converted to tensorRT using the new onnx intermediary format and with bisenetv2 on a Xavier NX with the latest NVidia L4T r32.4.4 and I'm getting an inference time of about ~0.02 or roughly 50 FPS.

But as I said above, post-processing step is still the limiting factor, still taking just under 1 second. That's where tensorRT won't save you though unfortunately ;)

ArtlyStyles commented 3 years ago

@evbo 0.02s is very impressive. I was on Xavier AGX, and I used VGG, my time was about 90ms per frame. The postprocessing could be faster if you write you own c++ code. I implemented my own DBScan, curveFitting etc and my postprocessing time is under 10ms. When you use DBscan, you do not need to put all pixels into it. You can select some pixels, for example, skip every pixel in the row, and skip every other row. Then you DBscan will be much faster.

@zoombinis: I was on AGX and use C++ code to run the converted UFF mode.

ArtlyStyles commented 3 years ago

I was able to use lanenet on Xavier to drive a robocar autonomously along sidewalks without using any other algorithms. https://twitter.com/SmallpixelCar/status/1297556145993707521

evbo commented 3 years ago

@ArtlyStyles nice, you must have trained your own model for that? Did you ever figure out how to position your camera to work on tusimple?

btw there's a great clustering library I'm using instead of DBSCAN (for python and c++): https://github.com/src-d/kmcuda

Also onnx format is available for tensorRt/tensorflow (fully supports lanenet): https://github.com/onnx/tensorflow-onnx

With my setup on the NX I got upwards of 15 FPS for inference + post-processing.

ArtlyStyles commented 3 years ago

@evbo yes, I trained my own model. Not sure what you meant by "position camera to work on tusimple".

For dmcuda, which clustering did you use?

evbo commented 3 years ago

@ArtlyStyles their kmeans_cuda (not the knn one). The downside is unlike the sci_kit learn package you need to guess how many clusters there are.

Not sure how others handle this, but I guess a fairly high number and then heuristically filter clusters based on size and shape. Not as good as HNet but very fast!

For the camera I thought maybe you were aiming it at the road to follow the white lane beside the road edge at an offset (maybe no training required?) but it makes sense that you trained it on sidewalk features.

ArtlyStyles commented 3 years ago

@evbo oh, for me, I always select two lanes, even on freeway. So on my training images, I always select left and right. Or only the left lane if there is no right lane in the image. SO kmean should work for me too.

edwardchang0112 commented 3 years ago

Hi @diennv @evbo ,

Like to ask for support. I also use jetson platform but got error while running this project. On my NVIDIA jetson NX... • JetPack Version (valid for Jetson only) 4.5-b129

• TensorRT Version 7.1.3

By running

python3 tools/test_lanenet.py --weights_path model/tusimple_lanenet/tusimple_lanenet.ckpt --image_path data/tusimple_test_image/0.jpg

get

2021-06-25 04:05:06.716212: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Traceback (most recent call last):
  File "tools/test_lanenet.py", line 19, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 101, in <module>
    from tensorflow_core import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/__init__.py", line 36, in <module>
    from tensorflow._api.v1 import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v1/compat/__init__.py", line 23, in <module>
    from tensorflow._api.v1.compat import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v1/compat/v1/__init__.py", line 673, in <module>
    from tensorflow_estimator.python.estimator.api._v1 import estimator
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/__init__.py", line 10, in <module>
    from tensorflow_estimator._api.v1 import estimator
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/_api/v1/estimator/__init__.py", line 12, in <module>
    from tensorflow_estimator._api.v1.estimator import inputs
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/_api/v1/estimator/inputs/__init__.py", line 10, in <module>
    from tensorflow_estimator.python.estimator.inputs.numpy_io import numpy_input_fn
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/inputs/numpy_io.py", line 26, in <module>
    from tensorflow_estimator.python.estimator.inputs.queues import feeding_functions
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/inputs/queues/feeding_functions.py", line 40, in <module>
    import pandas as pd
  File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 58, in <module>
    from pandas.io.api import *
  File "/usr/lib/python3/dist-packages/pandas/io/api.py", line 19, in <module>
    from pandas.io.packers import read_msgpack, to_msgpack
  File "/usr/lib/python3/dist-packages/pandas/io/packers.py", line 68, in <module>
    from pandas.util._move import (
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte

Anyone got this issue? Please share some comments with me!

MaybeShewill-CV / lanenet-lane-detection

Run on Jetson Xavier #383