Closed mostafajalal20 closed 4 months ago
Yes, you can definitely run multiple inference engines at the same time with different .engine
files by putting each one on a separate CPU thread. Just make sure each thread has its own CUDA stream to handle the tasks on the GPU. This way, each thread will manage its own inference engine and CUDA stream, allowing everything to run in parallel on the GPU.
I wrote a wrapper for your engine class and create objects from that wrapper for each thread, how can I make sure that every objects of engine class has its own CUDA stream? would you please show me a sample code.
I'm not a maintainer or author of this project, and I haven't done exactly what you're asking for before.
If you already have a wrapper and a vector of size n_threads
containing unique_ptr
to Engine
objects, then each instance would be responsible for its own CUDA stream. This logic seems to be defined in engine.h
, so each Engine
instance should manage its CUDA stream independently.
Yes @thomaskleiven is correct. The current implementation would however require you to use the same GPU index for all parallel instances, since cudaSetDevice
is only called once (you'd otherwise need to modify it to set the cuda device before any cuda operation). If your device only has a single GPU, then you don't need to worry about this.
I wrapped initializing of engines in a class and I want to use more than one engine at the same time with different .engine files. Do inferencing engines on different threads on cpu cause different thread on gpu?? how can I do this?