The web backend server for GPT-SoVITS lacks proper user input sanitization in the UVR5 Module, which leads to remote OS command injection vulnerability. This flaw allows attackers to execute arbitrary commands, compromising the system and causing critical security risks.
Due to this vulnerability and the web server's default public exposure, the GPT-SoVITS server is unsuitable for deployment in public production environments until the vulnerability is patched.
Affected Versions
Relased version < 20240821v2
As for today (2024.12.10), all versions of repo code
Under the hood, these classes externally call ffmpeg program to conduct the data pre-process operation, in which the shell command can be polluted by the user input as one out of the three vulnerable snippets shows:
An attacker can exploit this command injection vulnerability by crafting malicious inputs. These inputs can be provided via the HTML forms or modified in the HTTP request, as highlighted in the screenshot below:
PoC (Proof of Concept)
An attacker can easily achieve remote command execution (RCE) by first uploading an audio file and injecting malicous payloads into the vocal output folder path in a UVR5 conversion operation.
Install and Deploy the GPT-SoVITS following the official instructions (dependencies like: ffmpeg, UVR5 models) with WebUI enabled
Summary
The web backend server for
GPT-SoVITS
lacks proper user input sanitization in the UVR5 Module, which leads to remote OS command injection vulnerability. This flaw allows attackers to execute arbitrary commands, compromising the system and causing critical security risks.Due to this vulnerability and the web server's default public exposure, the
GPT-SoVITS
server is unsuitable for deployment in public production environments until the vulnerability is patched.Affected Versions
Relased version < 20240821v2
As for today (2024.12.10), all versions of repo code
Details
The vulnerability originates from the
_path_audio_
functions implemented in three classes, AudioPre class, MDXNetDereverb class and BsRoformer_Loader class. The UVR5 module relies these classes to pre-process audio data.Under the hood, these classes externally call
ffmpeg
program to conduct the data pre-process operation, in which the shell command can be polluted by the user input as one out of the three vulnerable snippets shows:The tool can be reached in the web UI.
An attacker can exploit this command injection vulnerability by crafting malicious inputs. These inputs can be provided via the HTML forms or modified in the HTTP request, as highlighted in the screenshot below:
PoC (Proof of Concept)
An attacker can easily achieve remote command execution (RCE) by first uploading an audio file and injecting malicous payloads into the vocal output folder path in a UVR5 conversion operation.
GPT-SoVITS
following the official instructions (dependencies like: ffmpeg, UVR5 models) with WebUI enabled_path_audio_
in the AudioPre class. It is almost the same in other two vulnerable functions.Replacing cmd value with your desired to trigger an RCE attack:
Patch
Replace this block:
With: