Closed cloud-rocket closed 1 year ago
I get full CPU utilisation when running inference on RPi, because numpy
and probably tf
too are using multithreading internally.
I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.
I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.
@tikurahul - What is your setup and TF version?
I use a Jetson Nano, so my experience may not be super applicable here. :smile:
Actually after a recent RPi software upgrade, tf has become much slower for '.h5' models, that's probably the reason why you are seeing this. Do you see the same affect also for .tflite
models?
I am working on a generic multi-threaded solution (please see -https://github.com/cloud-rocket/donkeycar/blob/add-multithreaded-keras-pilot/donkeycar/parts/keras.py)
But for some reason I only experience performance degradation (on both h5 and tflite options) - still not yet understand why...
Can you check with the latest version on dev. Set the variable CREATE_TENSOR_RT = True
and use the .trt
tensorrt model on the nano?
I'm going to close this. Both Tensorflow and Tensorflow Lite support multi-threaded inference. Here is Google's page on how to profile a model and the levers for increasing performance, including throwing more threads at it. https://www.tensorflow.org/lite/performance/best_practices#tweak_the_number_of_threads
We are not utilizing multi-core capability to run the most CPU consuming task and as a result getting low FPS. (See stats here https://github.com/autorope/donkeycar/issues/690).
I barely measure 19hz on Nano with tflite - based on the resulting calculation (52ms avg for tflite step) FPS cannot be higher (1000 / 52 = 19).
I suggest the following:
run_threaded
inputs to ordered queue "A" (ordered based on push timestamp)run_threaded
output to return the result (block if empty)update
while reading inputs from queue "A" and pushing outputs to queue "B" (based on the same input timestamp order)As a result get almost x4 FPS on Pi4 or almost x8 FPS on Nano!!!!
What do you think? I can push the code, but I don't have any working environment (my models are not yet working) - so I need somebody to test it in a real environment.
/cc: @DocGarbanzo , @sctse999, @tikurahul