When I specify the mode for ogg streaming inference, if the synthesized audio data is too large, a stack overflow error will be raised probabilistically while converting numpy vector to the ogg audio file.
Next, I'll show you how to reproduce the problem.
The script command to start API is as follows:
python.exe api.py -sm n -d cuda -p 11014 -a 127.0.0.1 -mt ogg
D:\AkagawaTsurunaki\WorkSpace\PycharmProjects\RVC-Boss\GPT-SoVITS\api.py:648: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(f"未指定SoVITS模型路径, fallback后当前值: {sovits_path}")
WARNING: 未指定SoVITS模型路径, fallback后当前值: GPT_SoVITS/pretrained_models/s2G488k.pth
D:\AkagawaTsurunaki\WorkSpace\PycharmProjects\RVC-Boss\GPT-SoVITS\api.py:651: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(f"未指定GPT模型路径, fallback后当前值: {gpt_path}")
WARNING: 未指定GPT模型路径, fallback后当前值: GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
INFO: 未指定默认参考音频
INFO: 半精: True
INFO: 流式返回已开启
INFO: 编码格式: ogg
...
DEBUG:jieba_fast:Loading model cost 0.444 seconds.
DEBUG:jieba_fast:Prefix dict has been built succesfully.
0%| | 0/1500 [00:00<?, ?it/s]D:\AkagawaTsurunaki\Models\GPT-SoVITS\GPT_SoVITS\AR\modules\patched_mha_with_cache.py:452: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = scaled_dot_product_attention(
81%|████████ | 1215/1500 [00:15<00:03, 76.61it/s]
T2S Decoding EOS [198 -> 1414]
D:\AkagawaTsurunaki\ProgramFiles\Anaconda\envs\GPTSoVits\lib\site-packages\torch\functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:879.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Process finished with exit code -1073741571 (0xC00000FD)
You can see that the program exits abnormally, and this exit code (-1073741571) indicates a stack overflow.
def pack_ogg(audio_bytes, data, rate):
with sf.SoundFile(audio_bytes, mode='w', samplerate=rate, channels=1, format='ogg') as audio_file:
audio_file.write(data)
return audio_bytes
and I changed it to:
def pack_ogg(audio_bytes, data, rate):
# Author: AkagawaTsurunaki
# Issue:
# Stack overflow probabilistically occurs
# when the function `sf_writef_short` of `libsndfile_64bit.dll` is called
# using the Python library `soundfile`
# Note:
# This is an issue related to `libsndfile`, not this project itself.
# It happens when you generate a large audio tensor (about 499804 frames in my PC)
# and try to convert it to an ogg file.
# Related:
# https://github.com/RVC-Boss/GPT-SoVITS/issues/1199
# https://github.com/libsndfile/libsndfile/issues/1023
# https://github.com/bastibe/python-soundfile/issues/396
# Suggestion:
# Or split the whole audio data into smaller audio segment to avoid stack overflow?
def handle_pack_ogg():
with sf.SoundFile(audio_bytes, mode='w', samplerate=rate, channels=1, format='ogg') as audio_file:
audio_file.write(data)
import threading
# See: https://docs.python.org/3/library/threading.html
# The stack size of this thread is at least 32768
# If stack overflow error still occurs, just modify the `stack_size`.
# stack_size = n * 4096, where n should be a positive integer.
# Here we chose n = 4096.
stack_size = 4096 * 4096
try:
threading.stack_size(stack_size)
pack_ogg_thread = threading.Thread(target=handle_pack_ogg)
pack_ogg_thread.start()
pack_ogg_thread.join()
except RuntimeError as e:
# If changing the thread stack size is unsupported, a RuntimeError is raised.
print("RuntimeError: {}".format(e))
print("Changing the thread stack size is unsupported.")
except ValueError as e:
# If the specified stack size is invalid, a ValueError is raised and the stack size is unmodified.
print("ValueError: {}".format(e))
print("The specified stack size is invalid.")
return audio_bytes
This way, converting larger audio data to OGG format should not cause stack overflow issues.
The is because soundfile has a stack overflow error when calling libsndfile_64bit.dll. As far as I know, adjusting the stack size is a relatively simple and feasible solution at present.
I have reported this issue to libsndfile and waiting for it to be fixed.
I submitted a PR, I tested the largest audio the model GPT-SoVITS was capable of synthesizing and converted it to OGG format, and the program worked fine.
When I specify the mode for ogg streaming inference, if the synthesized audio data is too large, a stack overflow error will be raised probabilistically while converting numpy vector to the ogg audio file.
Next, I'll show you how to reproduce the problem.
The script command to start API is as follows:
The HTTP request is as follows:
The log is as follows:
You can see that the program exits abnormally, and this exit code (-1073741571) indicates a stack overflow.
My solution is referred to Stack Overflow while writing OGG · Issue #396 · bastibe/python-soundfile (github.com).
This is the original code of
pack_ogg
inapi.py
:and I changed it to:
This way, converting larger audio data to OGG format should not cause stack overflow issues.
The is because
soundfile
has a stack overflow error when callinglibsndfile_64bit.dll
. As far as I know, adjusting the stack size is a relatively simple and feasible solution at present.I have reported this issue to
libsndfile
and waiting for it to be fixed.For more details, please see:
Stack overflow probabilistically occurs when the function
sf_writef_short
oflibsndfile_64bit.dll
is called using the Python librarysoundfile
· Issue #1023 · libsndfile/libsndfile (github.com)