How to read large text file (having tokens more than context limit of LLM)?

azaylamba commented 2 months ago

I have a requirement to read a text file having around 500k tokens. How can I efficiently read the file content using FileReaderTool or some other tool. The agents need to consider the content of entire file while performing tasks, so the techniques like summarization might not be sufficient.

Any ideas please?

theCyberTech commented 2 months ago

Try google Gemini flash 1.5 with 1M context

azaylamba commented 2 months ago

@theCyberTech Thanks for the suggestion. Is there any way to read the file in chunks so that LLMs with smaller context size can be used?

rezzie-rich commented 2 months ago

There is a plan to integrate memgpt into the framework. however, the attempt was mentioned back in January and there hasn't been any update on that.

@joaomdmoura can you please confirm if that is still part of the plan or the status of it?

azaylamba commented 2 months ago

@rezzie-rich Integrating memgpt would be very helpful. It would make CrewAI more useful for production level applications where context size is usually large.

rezzie-rich commented 2 months ago

@azaylamba I know, I'm asking the same question, lol

rezzie-rich commented 2 months ago

@joaomdmoura, currently, autogen is integrated to work with memgpt. I assume the process will be similar for crewai as well.

https://memgpt.readme.io/docs/autogen

lorenzejay commented 1 month ago

@rezzie-rich @azaylamba can you try using our tools?: https://docs.crewai.com/core-concepts/Tools/?h=tools

If its a PDF, we have a PDF search tool:

from crewai import Agent, Crew
from langchain_openai import ChatOpenAI
from crewai.process import Process

from crewai_tools import PDFSearchTool

from crewai.task import Task

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")

# Define the agents
researcher = Agent(
    role="Researcher",
    goal="Load 'https://arxiv.org/pdf/2406.04151' and understand it",
    backstory="Backstory 1",
    verbose=True,
    llm=llm,
    tools=[PDFSearchTool()],
)
analyzer = Agent(
    role="Analyzer",
    goal="Understand the paper from https://arxiv.org/pdf/2406.04151 and tell me about Evolving Large Language Model-based Agents across Diverse Environments",
    backstory="Backstory 2",
    verbose=True,
    llm=llm,
    tools=[PDFSearchTool()],
)

task1 = Task(
    name="Paper Ingestor",
    description="Search 'https://arxiv.org/pdf/2406.04151' and understand the paper",
    expected_output="give me a summary about the project",
    agent=researcher,
)
task2 = Task(
    name="Paper Analyzer",
    description="analyze the paper and tell me about Evolving Large Language Model-based Agents across Diverse Environments.  Ensure all information is accurate and comes from the searches. ",
    expected_output="give 3 paragaraph summary about the project",
    agent=analyzer,
)

# Create a crew with the tasks
crew = Crew(
    agents=[researcher, analyzer],
    tasks=[task1, task2],
    verbose=True,
    process=Process.sequential,
    # memory=True,
)

result = crew.kickoff()
print("results", result)

lorenzejay commented 1 month ago

alternatively you can load the pdf by instantiating the tool like this:

from crewai import Agent, Crew
from langchain_openai import ChatOpenAI
from crewai.process import Process

from crewai_tools import PDFSearchTool

from crewai.task import Task

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")

pdf_rag_tool = PDFSearchTool(pdf="https://arxiv.org/pdf/2406.04151")

# Define the agents
researcher = Agent(
    role="Researcher",
    goal="Load 'https://arxiv.org/pdf/2406.04151' and understand it",
    backstory="Backstory 1",
    verbose=True,
    llm=llm,
    tools=[pdf_rag_tool],
)

task1 = Task(
    name="Paper Ingestor",
    description="Search 'https://arxiv.org/pdf/2406.04151' and understand the paper then generate a 3 paragraph summary",
    expected_output="give me a 3 paragraph summary about the project",
    agent=researcher,
)

# Create a crew with the tasks
crew = Crew(
    agents=[researcher],
    tasks=[task1],
    verbose=True,
    process=Process.sequential,
    # memory=True,
)

result = crew.kickoff()
print("results", result)

azaylamba commented 1 month ago

Thanks for the suggestion @lorenzejay, will try this.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

crewAIInc / crewAI

How to read large text file (having tokens more than context limit of LLM)? #781