assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic
https://gptr.dev
MIT License
12.98k stars 1.61k forks source link

Research topic string needs to be sanitized before using it as filename #613

Open barsuna opened 1 week ago

barsuna commented 1 week ago

Found that if one uses '/' in the research topic (I.e. 'what i should research next' UI element) the filenames are not formed well and final research cannot be written as a result.

assafelovic commented 1 week ago

Hey @barsuna can you give examples to when '/' is part of a research task? I'll take a look at resolving it

barsuna commented 1 week ago

@assafelovic, apologies for being too terse, here is an example research task:

What are the latest developments in EBPF/XDP

It produces the following error:

  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'outputs/task_1719048446_What are the latest developments in EBPF/XDP.md'
vanRaco commented 1 day ago

@barsuna make these changes to your backend/server.py file:

  1. add import re at the top

  2. at line 40 add def sanitize_filename(filename): return re.sub(r'[^\w\s-]', '', filename).strip()

  3. at line 55 (after filename) add sanitized_filename = sanitize_filename(filename)

  4. update below filename to sanitized_filename

These changes should work. Alternatively, feel free to grab my server.py code and paste it over yours 👇

from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
from backend.websocket_manager import WebSocketManager
from backend.utils import write_md_to_pdf, write_md_to_word, write_text_to_md
import time
import json
import os
import re

class ResearchRequest(BaseModel):
    task: str
    report_type: str
    agent: str

app = FastAPI()

app.mount("/site", StaticFiles(directory="./frontend"), name="site")
app.mount("/static", StaticFiles(directory="./frontend/static"), name="static")

templates = Jinja2Templates(directory="./frontend")

manager = WebSocketManager()

# Dynamic directory for outputs once first research is run
@app.on_event("startup")
def startup_event():
    if not os.path.isdir("outputs"):
        os.makedirs("outputs")
    app.mount("/outputs", StaticFiles(directory="outputs"), name="outputs")

@app.get("/")
async def read_root(request: Request):
    return templates.TemplateResponse('index.html', {"request": request, "report": None})

# Add the sanitize_filename function here
def sanitize_filename(filename):
    return re.sub(r'[^\w\s-]', '', filename).strip()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await manager.connect(websocket)
    try:
        while True:
            data = await websocket.receive_text()
            if data.startswith("start"):
                json_data = json.loads(data[6:])
                task = json_data.get("task")
                report_type = json_data.get("report_type")
                filename = f"task_{int(time.time())}_{task}"
                sanitized_filename = sanitize_filename(filename)  # Sanitize the filename
                report_source = json_data.get("report_source")
                if task and report_type:
                    report = await manager.start_streaming(task, report_type, report_source, websocket)
                    # Saving report as pdf
                    pdf_path = await write_md_to_pdf(report, sanitized_filename)
                    # Saving report as docx
                    docx_path = await write_md_to_word(report, sanitized_filename)
                    # Returning the path of saved report files
                    md_path = await write_text_to_md(report, sanitized_filename)
                    await websocket.send_json({"type": "path", "output": {"pdf": pdf_path, "docx": docx_path, "md": md_path}})
                else:
                    print("Error: not enough parameters provided.")

    except WebSocketDisconnect:
        await manager.disconnect(websocket)
barsuna commented 1 hour ago

Thanks @vanRaco, it is all good on my end, was just sharing so it can be cleaned in master.