NVIDIAGameWorks / FleX

Other
669 stars 100 forks source link

Would running multiple FleX instances on a single GPU help with efficienty? Is that even possible? #61

Closed Amir-Arsalan closed 5 years ago

Amir-Arsalan commented 5 years ago

I don't know how to do any profiling for this so I wonder, would running multiple instances of FleX on a single GPU be beneficial in any way? I can certainly run multiple instances of FleX on a singe GPU but I'm also not sure if the GPU even allow running multiple instances of FleX simultaneously or does it queue them and processes them one by one?

mmacklin commented 5 years ago

Hi Amir,

You can run multiple Flex scenes on a single GPU but they will typically just be processed sequentially. With some trickery you could create multiple libraries with difference CUDA contexts, launch them simultaneously and the driver would interleave the execution allowing some overlap. I wouldn't expect much benefit from this approach though.

Cheers, Miles

Amir-Arsalan commented 5 years ago

@mmacklin Could you please explain a little bit more about this trick? What do you mean by "create multiple libraries"? Does that mean I need to change a small bit of the C code and compile different versions of FleX?

mmacklin commented 5 years ago

One way would be to create multiple Flex libraries, i.e.: call NvFlexInit() multiple times, then update each one in parallel from different CPU threads. That would in theory allow each library to update in an interleaved / overlapping way on a single GPU. I doubt it would be much faster than doing sequential updates though.

Amir-Arsalan commented 5 years ago

@mmacklin I think that's what I meant by running multiple instances of FleX. What I do is I call NvFlexDemoReleaseCUDA_x64 multiple times from different CPUs. As far as I know NvFlexDemoReleaseCUDA_x64 has a call to NvFlexInit(). Is that a valid way of doing what you just said?

mmacklin commented 5 years ago

Yep, running multiple copies of the NvFlexDemoReleaseCUDA_x64 process will essentially do what I mentioned. It's also a pretty good way to do multi-process updates, e.g.: if you have N GPUs you can launch N process copies with each one assigned a different GPU (e.g.: using -device=1..N) on the command line.