cyrusbehr / tensorrt-cpp-api

TensorRT C++ API Tutorial
MIT License
596 stars 74 forks source link

Using engine models multithread #57

Closed mostafajalal20 closed 4 months ago

mostafajalal20 commented 5 months ago

I wrapped initializing of engines in a class and I want to use more than one engine at the same time with different .engine files. Do inferencing engines on different threads on cpu cause different thread on gpu?? how can I do this?

thomaskleiven commented 4 months ago

Yes, you can definitely run multiple inference engines at the same time with different .engine files by putting each one on a separate CPU thread. Just make sure each thread has its own CUDA stream to handle the tasks on the GPU. This way, each thread will manage its own inference engine and CUDA stream, allowing everything to run in parallel on the GPU.

mostafajalal20 commented 4 months ago

I wrote a wrapper for your engine class and create objects from that wrapper for each thread, how can I make sure that every objects of engine class has its own CUDA stream? would you please show me a sample code.

thomaskleiven commented 4 months ago

I'm not a maintainer or author of this project, and I haven't done exactly what you're asking for before.

If you already have a wrapper and a vector of size n_threads containing unique_ptr to Engine objects, then each instance would be responsible for its own CUDA stream. This logic seems to be defined in engine.h, so each Engine instance should manage its CUDA stream independently.

cyrusbehr commented 4 months ago

Yes @thomaskleiven is correct. The current implementation would however require you to use the same GPU index for all parallel instances, since cudaSetDevice is only called once (you'd otherwise need to modify it to set the cuda device before any cuda operation). If your device only has a single GPU, then you don't need to worry about this.