assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic
MIT License
12.98k stars 1.61k forks source link

Research topic string needs to be sanitized before using it as filename #613

Open barsuna opened 1 week ago

barsuna commented 1 week ago

Found that if one uses '/' in the research topic (I.e. 'what i should research next' UI element) the filenames are not formed well and final research cannot be written as a result.

assafelovic commented 1 week ago

Hey @barsuna can you give examples to when '/' is part of a research task? I'll take a look at resolving it

barsuna commented 1 week ago

@assafelovic, apologies for being too terse, here is an example research task:

What are the latest developments in EBPF/XDP

It produces the following error:

  File "/usr/lib/python3.12/concurrent/futures/", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/task_1719048446_What are the latest developments in EBPF/'
vanRaco commented 1 day ago

@barsuna make these changes to your backend/ file:

  1. add import re at the top

  2. at line 40 add def sanitize_filename(filename): return re.sub(r'[^\w\s-]', '', filename).strip()

  3. at line 55 (after filename) add sanitized_filename = sanitize_filename(filename)

  4. update below filename to sanitized_filename

These changes should work. Alternatively, feel free to grab my code and paste it over yours 👇

from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
from backend.websocket_manager import WebSocketManager
from backend.utils import write_md_to_pdf, write_md_to_word, write_text_to_md
import time
import json
import os
import re

class ResearchRequest(BaseModel):
    task: str
    report_type: str
    agent: str

app = FastAPI()

app.mount("/site", StaticFiles(directory="./frontend"), name="site")
app.mount("/static", StaticFiles(directory="./frontend/static"), name="static")

templates = Jinja2Templates(directory="./frontend")

manager = WebSocketManager()

# Dynamic directory for outputs once first research is run
def startup_event():
    if not os.path.isdir("outputs"):
    app.mount("/outputs", StaticFiles(directory="outputs"), name="outputs")

async def read_root(request: Request):
    return templates.TemplateResponse('index.html', {"request": request, "report": None})

# Add the sanitize_filename function here
def sanitize_filename(filename):
    return re.sub(r'[^\w\s-]', '', filename).strip()

async def websocket_endpoint(websocket: WebSocket):
    await manager.connect(websocket)
        while True:
            data = await websocket.receive_text()
            if data.startswith("start"):
                json_data = json.loads(data[6:])
                task = json_data.get("task")
                report_type = json_data.get("report_type")
                filename = f"task_{int(time.time())}_{task}"
                sanitized_filename = sanitize_filename(filename)  # Sanitize the filename
                report_source = json_data.get("report_source")
                if task and report_type:
                    report = await manager.start_streaming(task, report_type, report_source, websocket)
                    # Saving report as pdf
                    pdf_path = await write_md_to_pdf(report, sanitized_filename)
                    # Saving report as docx
                    docx_path = await write_md_to_word(report, sanitized_filename)
                    # Returning the path of saved report files
                    md_path = await write_text_to_md(report, sanitized_filename)
                    await websocket.send_json({"type": "path", "output": {"pdf": pdf_path, "docx": docx_path, "md": md_path}})
                    print("Error: not enough parameters provided.")

    except WebSocketDisconnect:
        await manager.disconnect(websocket)
barsuna commented 1 hour ago

Thanks @vanRaco, it is all good on my end, was just sharing so it can be cleaned in master.