Open BACMiao opened 1 week ago
Hi @BACMiao! I'm here to help you with any bugs, questions, or contributions you have.
The SemanticAgent
in pandasai v2.2.14 can indeed be exploited to trigger remote code execution (RCE) through conversation. This vulnerability arises from the query
method, which generates and executes code based on user input without proper validation or sanitization. Here's a simplified explanation of how this can be exploited:
Code Generation and Execution: The query
method uses a pipeline to generate code from user queries and then executes this code. If the generated code contains malicious commands, they will be executed, leading to RCE.
def query(self, query):
query_pipeline = Pipeline(
context=self.context,
logger=self.logger,
steps=[
CodeGenerator(),
],
)
code = query_pipeline.run(query)
self.execute_code(code)
Bypassing Security Checks: The execute_code
method from the BaseAgent
class executes the generated code without performing any security checks, making it possible for malicious code to be executed.
def execute_code(self, code: Optional[str] = None, output_type: Optional[str] = None):
if not self.pipeline:
return (
"Unfortunately, I was not able to get your answers, "
"because of the following error: No pipeline exists to execute try Agent class"
)
try:
if code is None:
code = self.last_code_generated
self.logger.log(f"Code: {code}")
self.logger.log(
f"Running PandasAI with {self.context.config.llm.type} LLM..."
)
self.assign_prompt_id()
pipeline_input = CodeExecutionPipelineInput(
code, output_type, self.conversation_id, self.last_prompt_id
)
return self.pipeline.run_execute_code(pipeline_input)
except Exception as exception:
return (
"Unfortunately, I was not able to get your answers, "
"because of the following error:\n"
f"\n{exception}\n"
)
To mitigate this risk, it is crucial to implement proper input validation and sanitization in the CodeGenerator
step and ensure that the execute_code
method includes security checks to prevent the execution of malicious code [1][2].
To continue talking to Dosu, mention @dosu.
The first PoC is not triggered through the query
. By analyzing the process of this RCE trigger, the execution path goes through the chat()
function, meaning that the following execution processes will all occur, but the malicious code can bypass the security checks of CodeGenerator and CodeCleaning.
[INFO] Executing Step 0: ValidatePipelineInput
[INFO] Executing Step 1: CacheLookup
[INFO] Executing Step 2: SemanticPromptGeneration
[INFO] Executing Step 3: LLMCall
[INFO] Executing Step 4: CodeGenerator
[INFO] Executing Step 5: CodeCleaning
[INFO] Executing Step 6: CodeExecution
System Info
OS version: macOS 14.3.1 Python version: Python 3.12.4 The current version of pandasai being used: v2.2.14
🐛 Describe the bug
Hi, Team
While using the
SemanticAgent
, I discovered that users can bypass existing security checks by manipulating the providedschema
and forcing the system to return specificmeasures
values during conversations with the LLM, potentially triggering the execution of risky code throughexec
.PoC: (from the example code)
Log: Execute arbitrary instructions through code (e.g. read file contents)
This is the log information printed by pandasai.log
Additionally, I found that directly using the
execute_code
method from theBaseAgent
can also bypass some security checks.PoC: