RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MIT License
36.11k stars 4.12k forks source link

Remote OS Command Injection Vulnerability in GPT-SoVITS UVR5 Module #1771

Open superboy-zjc opened 2 weeks ago

superboy-zjc commented 2 weeks ago

Summary

The web backend server for GPT-SoVITS lacks proper user input sanitization in the UVR5 Module, which leads to remote OS command injection vulnerability. This flaw allows attackers to execute arbitrary commands, compromising the system and causing critical security risks.

Due to this vulnerability and the web server's default public exposure, the GPT-SoVITS server is unsuitable for deployment in public production environments until the vulnerability is patched.

Affected Versions

Details

The vulnerability originates from the _path_audio_ functions implemented in three classes, AudioPre class, MDXNetDereverb class and BsRoformer_Loader class. The UVR5 module relies these classes to pre-process audio data.

Under the hood, these classes externally call ffmpeg program to conduct the data pre-process operation, in which the shell command can be polluted by the user input as one out of the three vulnerable snippets shows:

#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/vr.py#L151
#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/mdxnet.py#L223
#https://github.com/RVC-Boss/GPT-SoVITS/blob/a70e1ad30c072cdbcfb716962abdc8008fa41cc2/tools/uvr5/bsroformer.py#L180
def run_folder(self,input, vocal_root, others_root, format):
    ...
    path_vocal = "%s/%s_vocals.wav" % (vocal_root, os.path.basename(path)[:-4])
    ...
    opt_path_vocal = path_vocal[:-4] + ".%s" % format
    ...
    os.system(
        "ffmpeg -i '%s' -vn '%s' -q:a 2 -y" % (path_vocal, opt_path_vocal)
    )

The tool can be reached in the web UI.

image-20241110213213738

An attacker can exploit this command injection vulnerability by crafting malicious inputs. These inputs can be provided via the HTML forms or modified in the HTTP request, as highlighted in the screenshot below:

image-20241110213320464

PoC (Proof of Concept)

An attacker can easily achieve remote command execution (RCE) by first uploading an audio file and injecting malicous payloads into the vocal output folder path in a UVR5 conversion operation.

  1. Install and Deploy the GPT-SoVITS following the official instructions (dependencies like: ffmpeg, UVR5 models) with WebUI enabled
  2. Turn on the UVR5-WebUI

image-20241110171732806

  1. Download a legitmate audio file: https://www.signalogic.com/melp/EngSamples/Orig/ENG_M.wav
  2. Run the exploitation script: https://gist.github.com/superboy-zjc/2ed271471849418580686ac3c5de30cd. This PoC script dememstrate how to exploit the vulnerable function _path_audio_ in the AudioPre class. It is almost the same in other two vulnerable functions.

Replacing cmd value with your desired to trigger an RCE attack:

python SoVITS-uvr5-exp.py -f test.wav -u http://proof-of-concept:9873 -cmd "ping XXX"

image-20241110215104106

Patch

Replace this block:

os.system(
    "ffmpeg -i '%s' -vn '%s' -q:a 2 -y" % (path_vocal, opt_path_vocal)
)

With:

import subprocess

subprocess.run(
    ["ffmpeg", "-i", path_vocal, "-vn", opt_path_vocal, "-q:a", "2", "-y"],
    check=True
)
superboy-zjc commented 5 days ago

Any update? @RVC-Boss I can help with the patch after a confirmation of the vulnerability. Thanks!