Remote OS Command Injection Vulnerability in GPT-SoVITS UVR5 Module

Summary

The web backend server for GPT-SoVITS lacks proper user input sanitization in the UVR5 Module, which leads to remote OS command injection vulnerability. This flaw allows attackers to execute arbitrary commands, compromising the system and causing critical security risks.

Due to this vulnerability and the web server's default public exposure, the GPT-SoVITS server is unsuitable for deployment in public production environments until the vulnerability is patched.

Affected Versions

Relased version < 20240821v2
As for today (2024.12.10), all versions of repo code

Details

The vulnerability originates from the _path_audio_ functions implemented in three classes, AudioPre class, MDXNetDereverb class and BsRoformer_Loader class. The UVR5 module relies these classes to pre-process audio data.

Under the hood, these classes externally call ffmpeg program to conduct the data pre-process operation, in which the shell command can be polluted by the user input as one out of the three vulnerable snippets shows:

#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/vr.py#L151
#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/mdxnet.py#L223
#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/bsroformer.py#L180
def run_folder(self,input, vocal_root, others_root, format):
    ...
    path_vocal = "%s/%s_vocals.wav" % (vocal_root, os.path.basename(path)[:-4])
    ...
    opt_path_vocal = path_vocal[:-4] + ".%s" % format
    ...
    os.system(
        "ffmpeg -i '%s' -vn '%s' -q:a 2 -y" % (path_vocal, opt_path_vocal)
    )

The tool can be reached in the web UI.

An attacker can exploit this command injection vulnerability by crafting malicious inputs. These inputs can be provided via the HTML forms or modified in the HTTP request, as highlighted in the screenshot below:

PoC (Proof of Concept)

An attacker can easily achieve remote command execution (RCE) by first uploading an audio file and injecting malicous payloads into the vocal output folder path in a UVR5 conversion operation.

Install and Deploy the GPT-SoVITS following the official instructions (dependencies like: ffmpeg, UVR5 models) with WebUI enabled
Turn on the UVR5-WebUI

Download a legitmate audio file: https://www.signalogic.com/melp/EngSamples/Orig/ENG_M.wav
Run the exploitation script: https://gist.github.com/superboy-zjc/2ed271471849418580686ac3c5de30cd. This PoC script dememstrate how to exploit the vulnerable function _path_audio_ in the AudioPre class. It is almost the same in other two vulnerable functions.

Replacing cmd value with your desired to trigger an RCE attack:

python SoVITS-uvr5-exp.py -f test.wav -u http://proof-of-concept:9873 -cmd "ping XXX"

Patch

Replace this block:

os.system(
    "ffmpeg -i '%s' -vn '%s' -q:a 2 -y" % (path_vocal, opt_path_vocal)
)

With:

import subprocess

subprocess.run(
    ["ffmpeg", "-i", path_vocal, "-vn", opt_path_vocal, "-q:a", "2", "-y"],
    check=True
)

RVC-Boss / GPT-SoVITS