brentspell / hifi-gan-bwe

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.
MIT License
205 stars 26 forks source link

Very high CPU and memory usage whth hifi-synth #5

Closed Baiyuetribe closed 2 years ago

Baiyuetribe commented 2 years ago

image

hifi-synth hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav

Can I enable GPU acceleration if the running memory and CPU consumption are too much?

brentspell commented 2 years ago

Yes, if you have a CUDA-enabled GPU, you can enable it by loading the model and data onto the GPU before running inference.

https://github.com/brentspell/hifi-gan-bwe/blob/584f28697b69eb7bec51eb2bd5a3d3b7bba79495/hifi_gan_bwe/scripts/synth.py#L42

model = models.BandwidthExtender.from_pretrained(args.model).cuda()

https://github.com/brentspell/hifi-gan-bwe/blob/584f28697b69eb7bec51eb2bd5a3d3b7bba79495/hifi_gan_bwe/scripts/synth.py#L55

audio = np.stack([model(torch.from_numpy(x).cuda(), sample_rate).cpu() for x in audio.T]).T
brentspell commented 2 years ago

This would also be a useful feature to have in the library, so I'll look into adding it to the script.

brentspell commented 2 years ago

This has been released in version 0.1.13 of the library. You can modify your command above as follows to run synthesis on a CUDA-capable GPU:

hifi-synth --device=cuda hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav

Please reopen the issue if you run into any problems with this change.

Baiyuetribe commented 2 years ago

cuda takes effect, but the gpu memory overflows. The source file audio is only 4M. image

Baiyuetribe commented 2 years ago

Also I can't reopen the issue, you seem to have disabled it image

brentspell commented 2 years ago

Ah yes, an 11MB MP3 file would be quite a bit of audio, which explains the OOM on the GPU. It would be nice if the synth script would break up the audio into segments, process them, and then reassemble the resulting audio. If I get some time this week, I'll add that to the script.

brentspell commented 2 years ago

I added streaming/cross-fading support in the hifi-synth script to version 0.1.14 of the library. By default, the script will window the signal into 30sec frames with 25ms cross-fading overlap, but these can be customized when you call the script. Thanks again for the report, and I hope this helps.