Trying to Find Bottleneck When Using Nvidia Jetson Nano

voloved commented 3 years ago

Hi,

Great work on this! It's amazing to see this working! I am testing this software out on a 4 GB NVIDIA Jetson Nano Developer Kit, and am seeing ~1 minute needed to synthesize a waveform, and am trying to figure out what the bottleneck could be.

I originally tried this code on my Windows machine (Ryzen 7 2700X) and saw about 10 seconds for the waveform to be synthesized. This testing used the CPU for inference.

On the Jetson, it's using the GPU: "Found 1 GPUs available. Using GPU 0 (NVIDIA Tegra X1) of compute capability 5.3 with 4.1Gb total memory."

It did seem to be RAM-limited at first, but created a swap file to file the gap and did not see the RAM changing much during synthesis. I can see it being read during synthesis and the read time of disk slowing everything down, but it looked like one of the four CPU cores was also taking a 100% load to process, making me think that I'm CPU bottlenecked.

I figured that since this project uses PyTorch, using a 128 CUDA core GPU would be faster than an 8 core CPU, but I may be missing some fundamentals, especially when seeing that one of my CPU cores is at 100% usage. Is synthesis CPU and GPU constrained or would it rely mostly on GPU?

Here are images of the program just before it finished synthesizing and just after with jtop monitoring GPU, CPU, and RAM.

Before:

5.5GB of memory used. 3.4 is RAM, 2.089 is swap file on disk
CPU1 at 100%
CPU 2 at 25%
GPU at 40%

beforeSynthDone

After:

5.5GB of memory used. 3.4 is RAM, 2.089 is swap file on disk
CPU1 at 12%
CPU 2 at 98%
GPU at 0%

afterSynthDone

Thank you! voloved

ghost commented 3 years ago

Hi, thank you for submitting such a detailed issue with screenshots. What is the performance difference when vocoding in --cpu mode compared to GPU?

There are some operations in the vocoder model that need to be performed on CPU, so it is plausible that is the bottleneck. You can step through the model and refer to the pytorch documentation to see if a particular operation is performed using the CPU or GPU.

ghost commented 3 years ago

Closing as this issue is too specialized to be of use to most users.

CorentinJ / Real-Time-Voice-Cloning

Trying to Find Bottleneck When Using Nvidia Jetson Nano #794