Suggestion: convert Keras/Tflite to be multi-threaded on inference

autorope / donkeycar

Open source hardware and software platform to build a small scale self driving car.

http://www.donkeycar.com

MIT License

3.17k stars 1.3k forks source link

Suggestion: convert Keras/Tflite to be multi-threaded on inference #770

Closed cloud-rocket closed 1 year ago

cloud-rocket commented 3 years ago

We are not utilizing multi-core capability to run the most CPU consuming task and as a result getting low FPS. (See stats here https://github.com/autorope/donkeycar/issues/690).

I barely measure 19hz on Nano with tflite - based on the resulting calculation (52ms avg for tflite step) FPS cannot be higher (1000 / 52 = 19).

I suggest the following:

Push run_threaded inputs to ordered queue "A" (ordered based on push timestamp)
Pop from ordered queue B on run_threaded output to return the result (block if empty)
Run Keras/tflite inside update while reading inputs from queue "A" and pushing outputs to queue "B" (based on the same input timestamp order)

As a result get almost x4 FPS on Pi4 or almost x8 FPS on Nano!!!!

What do you think? I can push the code, but I don't have any working environment (my models are not yet working) - so I need somebody to test it in a real environment.

/cc: @DocGarbanzo , @sctse999, @tikurahul

DocGarbanzo commented 3 years ago

I get full CPU utilisation when running inference on RPi, because numpy and probably tf too are using multithreading internally.

tikurahul commented 3 years ago

I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.

cloud-rocket commented 3 years ago

I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.

@tikurahul - What is your setup and TF version?

tikurahul commented 3 years ago

I use a Jetson Nano, so my experience may not be super applicable here. :smile:

DocGarbanzo commented 3 years ago

Actually after a recent RPi software upgrade, tf has become much slower for '.h5' models, that's probably the reason why you are seeing this. Do you see the same affect also for .tflite models?

cloud-rocket commented 3 years ago

I am working on a generic multi-threaded solution (please see -https://github.com/cloud-rocket/donkeycar/blob/add-multithreaded-keras-pilot/donkeycar/parts/keras.py)

But for some reason I only experience performance degradation (on both h5 and tflite options) - still not yet understand why...

DocGarbanzo commented 3 years ago

Can you check with the latest version on dev. Set the variable CREATE_TENSOR_RT = True and use the .trt tensorrt model on the nano?

Ezward commented 1 year ago

I'm going to close this. Both Tensorflow and Tensorflow Lite support multi-threaded inference. Here is Google's page on how to profile a model and the levers for increasing performance, including throwing more threads at it. https://www.tensorflow.org/lite/performance/best_practices#tweak_the_number_of_threads