expectedparrot / edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
https://docs.expectedparrot.com
MIT License
98 stars 15 forks source link

Add method for creating a single `Scenario` for a complete PDF #713

Open rbyh opened 2 weeks ago

rbyh commented 2 weeks ago

Add to Scenario.py:

def from_pdf(cls, pdf_path):
    # Ensure the file exists
    if not os.path.exists(pdf_path):
        raise FileNotFoundError(f"The file {pdf_path} does not exist.")

    # Open the PDF file
    document = fitz.open(pdf_path)

    # Get the filename from the path
    filename = os.path.basename(pdf_path)

    # Iterate through each page and extract text
    text = ""
    for page_num in range(len(document)):
        page = document.load_page(page_num)
        text = text + page.get_text()

    # Create a dictionary for the combined text
    page_info = {"filename": filename, "text": text}
    return Scenario(page_info)
rbyh commented 2 weeks ago

from_pdf() could be a method of Scenario (new) or ScenarioList (existing)