Researcher Role Cannot Serialize SearchEngine After Exception

AidanShipperley commented 8 months ago

Bug description It looks like I'm consistently encountering an issue when using the Researcher role. At some point during the research, an SSL ssl.SSLWantReadError exception is encountered, and when attempting to serialize to support breakpoint recovery, the SearchEngine fails to get serialized. I've attached the entire terminal log output for clarity. If this is not a mistake I made and just something that needs to be added, let me know and I can try to look into this, I need it for work.

Bug solved method I'm not sure if the Researcher role has been fleshed out. Here are the steps to reproduce the issue:

I made a conda environment via conda create -n metagpt -y python==3.9 and conda activate metagpt
I set up env vars for OPENAI and SERPAPI via conda env config vars set KEY=VALUE
pip install metagpt and I also had to run pip install playwright==1.40.0
Open metagpt/roles/__init__.py and add the Researcher Role: from metagpt.roles.researcher import Researcher and "Researcher", to __all__
Modify the metagpt/startup.py to include the Researcher role in company.hire()

Run the following:

python startup.py "Research and develop a system to interact with an open source LLM, such as Llama 2 13b chat, and perform Retrieval Augmented Generation based on documents from a file stored as various text files. The researcher should be responsible for choosing the vector database to use, and they should debate the pros and cons of open source vector databases and closed source ones. The project should all be done in Python."

Environment information

System Version：Windows 11,
Python version: (conda python 3.9.0),
LLM type and model: (OpenAI gpt-4)
packages version: Default
installation method: pip

Screenshots or logs Terminal_Log.txt

voidking commented 7 months ago

The Terminal_Log.txt is empty. Maybe you can add it again.

nickjtch commented 7 months ago

@AidanShipperley faced the same issue. Were you able to resolve it?

AidanShipperley commented 7 months ago

@voidking my apologies, I'll just paste the terminal log below:

(metagpt) C:\Users\DL_User\Documents\MetaGPT\MetaGPT-0.6.0\MetaGPT-0.6.0\metagpt>python startup.py "Research and develop a system to interact with an open source LLM, such as Llama 2 13b chat, and perform Retrieval Augmented Generation based on documents from a file stored as various text files. The researcher should be responsible for choosing the vector database to use, and they should debate the pros and cons of open source vector databases and closed source ones. The project should all be done in Python."
2024-01-04 13:22:03.778 | INFO     | metagpt.const:get_metagpt_package_root:32 - Package root set to C:\Users\DL_User\Documents\MetaGPT\MetaGPT-0.6.0\MetaGPT-0.6.0\metagpt
2024-01-04 13:22:04.142 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:04.150 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.202 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.202 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.312 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.312 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.343 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.343 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.375 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.375 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.390 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.406 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.421 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.421 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.562 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.578 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.594 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:11.594 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
2024-01-04 13:22:11.625 | INFO     | metagpt.team:invest:86 - Investment: $6.0.
2024-01-04 13:22:11.625 | INFO     | metagpt.roles.role:_act:357 - Alice(Product Manager): to do PrepareDocuments(PrepareDocuments)
2024-01-04 13:22:12.335 | INFO     | metagpt.roles.researcher:_act:56 - David(Researcher): to do CollectLinks(David)
2024-01-04 13:22:12.367 | INFO     | metagpt.utils.file_repository:save:60 - save to: C:\Users\DL_User\Documents\MetaGPT\MetaGPT-0.6.0\MetaGPT-0.6.0\metagpt\workspace\20240104132211\docs\requirement.txt
["Retrieval Augmented Generation", "open source LLM"]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:14.557 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.005 | Max budget: $10.000 | Current cost: $0.005, prompt_tokens: 148, completion_tokens: 14
2024-01-04 13:22:20.935 | INFO     | metagpt.config:get_default_llm_provider_enum:124 - LLMProviderEnum.OPENAI Model: gpt-4
2024-01-04 13:22:20.935 | INFO     | metagpt.config:get_default_llm_provider_enum:126 - API: LLMProviderEnum.OPENAI
["What is the process of Retrieval Augmented Generation?", "How can Retrieval Augmented Generation improve the quality of LLM-generated responses?", "What are the top open-source Large Language Models for 2024?", "What are the benefits and risks of using open source large language models?"]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:27.105 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.050 | Max budget: $10.000 | Current cost: $0.045, prompt_tokens: 1366, completion_tokens: 60
[1, 0, 2, 4, 6, 3, 5]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:35.525 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.074 | Max budget: $10.000 | Current cost: $0.024, prompt_tokens: 760, completion_tokens: 21
[2, 1, 0, 4, 5, 6, 7, 3]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:42.161 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.101 | Max budget: $10.000 | Current cost: $0.027, prompt_tokens: 858, completion_tokens: 24
[0, 1, 3, 4, 5, 7]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:46.214 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.129 | Max budget: $10.000 | Current cost: $0.028, prompt_tokens: 910, completion_tokens: 18
[0, 1, 3, 4, 7]Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:22:49.742 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.155 | Max budget: $10.000 | Current cost: $0.026, prompt_tokens: 836, completion_tokens: 15
2024-01-04 13:22:49.758 | INFO     | metagpt.roles.researcher:_act:56 - David(Researcher): to do WebBrowseAndSummarize(David)
Not relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:23:46.942 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.164 | Max budget: $10.000 | Current cost: $0.009, prompt_tokens: 284, completion_tokens: 3
Retrieval Augmented Generation (RAG) can significantlyRetrievalNot relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:23:48.061 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.172 | Max budget: $10.000 | Current cost: $0.007, prompt_tokens: 244, completion_tokens: 3
-augmented improve the quality generation (RThe top open of Large LanguageAG) is an AI framework-source Large Language Model (LL designed to improveM)-generated Models (LL the quality of responses generated by responses. RMs) for large language modelsNot relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:23:49.640 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.179 | Max budget: $10.000 | Current cost: $0.007, prompt_tokens: 244, completion_tokens: 3
AG is an 2024 AI framework that are:

1 retrieves facts from. **L (LLMslama 2). It does an external knowledge**: Developed by base to ground Meta, L LLMslama 2 this by grounding on the mostThe benefits of the model on using open source accurate, up external sources of-to-date information is a significant large language models knowledge, supplement, giving users (LLMs advancement over itsing the L insight into LLM's internal predecessor. It) include transparencyLMs' representation of information, community contributions. This has generative process, and the.

LL has a diverse training data,Ms can sometimes is supported on ability to fine be inconsistent,-tune for platforms like Azure providing accurate answers two primary benefits to questions at: it ensures times, and and Windows, specific use cases the model has. Open source access to the and is optimized at other times, regurg most current, for platforms likeitating random facts reliable facts, AWS, Azure and it allows LLMs, and Hugging Face's are free and users to access the model's AI model hosting available for anyone sources, ensuring to access, platform. It which challenges proprietary has a safety models that may its claims can be checked for from their training include usage restrictions-centric design and data. This. The transparency comes in two accuracy and trustworthiness.

RAG works of open source inconsistency arises because main versions – in two phases Llama  LLMs LLMs: retrieval and2 and L allows for a content generation.lama 2 better understanding of understand how words In the retrieval-Chat.

 their architecture and relate statistically,2. ** training data. but not what They also benefitBloom**: phase, algorithms from community contributions they mean. Developed by H

RAG search for and and multiple service retrieve snippets of information relevant to providers, which the user's prompt or question improves the quality allow for experimentation. This informationugging Face, and diverse perspectives of LLM can come from. This leads Bloom is proficient indexed documents on-generated responses by in generating text the internet in to widespread adoption in 46 grounding the model an open-domain in various organizations, such as on external sources NASA and healthcare setting, or of knowledge to from a narrower supplement the L languages and  set of sources institutions, whichLM’s internal in a closed use open source13 programming languages representation of information-domain, enterprise. This approach LLMs. It has setting for added has two main for geos 176 billion security and reliabilitypatial data and benefits: it. This external parameters and is knowledge is then ensures that the model has access diagnostic tools. appended to the designed for aut user's prompt Moreover, preoregressive text-trained open source generation. It and passed to to the most LLMs current, reliable the language model's source code facts, and allow a process.

In the called fine-t and training data generative phase are publicly availableuning, meaning, the L.

3. you can add it gives users access to the features to the model’s sources **MPT LLM that, ensuring thatLM uses the benefit your specific-7B use case and its claims can augmented prompt and its internal representation be checked for of its training**: Developed by train them on accuracy and ultimately MosaicML data to syn trusted.

R specific data setsthesize an engagingAG also reduces answer tailored to.

However the user in the chances that Foundations, M an LLM, there are risks associated with will leak sensitive that instant.PT-7 The answer can using open sourceB is a then be passed data or ' LLMshallucinate. These include GPT-style to a chat potential security vulnerabilities, decoder-onlybot with links' incorrect or to its sources transformer model. misleading information..

RAG It reduces the, lack of also has additional It has been support, and need for users benefits. By potential misuse of grounding an L to continuously train trained on an the model on extensive dataset comprising the technology.LM on a new data and 1 trillion update its parameters tokens of text set of external as circumstances evolve Open source L, verifiable and code., thereby lowering facts, theLMs can It is licensed the computational and model has fewer pose risks such for commercial use financial costs of as hallucinations opportunities to pull and is designed running LLM information baked into to process extremely-powered chatbots its parameters, lengthy inputs without in an enterprise, bias, compromise.

4 reducing the chances setting.

R and security problems. **FAG operates in that an L, making itLM will leakalcon**: Developed two phases: sensitive data or by the Technology 'halluc Innovation Institute, retrieval and contentinate' incorrect crucial to mitigate these risks in Falcon LLM generation. In or misleading information the early days the retrieval phase, specifically Falcon of large language. It also-40B models. Despite reduces the need these risks, for users to, algorithms search the progress of continuously train the, is equipped model on new for and retrieve data and update with 40 snippets of information its parameters as open source L billion parameters and relevant to the circumstances evolve, has been trainedLMs is lowering the computational on one trillion quite impressive, and financial costs tokens. It of running L with models like user’s prompt Llama  operates as anLM-powered chat2 and Vic or question.bots in anuna showing significant enterprise setting.

 This information is advancements in gener autoregressiveHowever, RAG is not then appended to decoder-only modelative text capabilities perfect and there the user’s and outper prompt and passed. Huggingforms GPT are many challenges to the languageface maintains an that remain in model. In getting RAG done right. open LLM These include how to find and leaderboard that tracks fetch the most, ranks, relevant information possible the generative-3 while to feed the LLM ( and evaluates open utilizing only  source LLM phase, the75% ofs on various the training computeretrieval budget.

5 LLM draws. **V from the augmented benchmarks.

), and howicuna- prompt and its13B**:In conclusion, internal representation of to best structure Developed by L while open source its training data LLMs that information to to synthesize offer numerous benefits get the richest an engaging answer responses from theMSYS OR, it isG, Vic crucial to be LLM (generation).Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:25.623 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.270 | Max budget: $10.000 | Current cost: $0.091, prompt_tokens: 2226, completion_tokens: 398
una-13 aware of and tailored to theB is a mitigate the associated risks.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:26.002 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.328 | Max budget: $10.000 | Current cost: $0.058, prompt_tokens: 1289, completion_tokens: 320
 chatbot that user in that has been trained by fine-t instant.

RRetrievalOpen source largeuning LLaAG is currently language models (-augmentedMA on user the best-known generation (R-shared conversations sourcedAG) isLLMs) tool for grounding from ShareG are AI-based a technique that models that usePT. It LLMs massive data sets to generate text on the latest, translate languages enhances the accuracy, verifiable outperforms, and create and reliability of various types of information, and other notable models content. Unlike lowering the costs generative AI proprietary LLM such as Ls owned by of having to companies and used models by fetching under license restrictions, open source LLMs constantly retrainLaMA and are freely accessible to anyone for any purpose, and update them facts from external including modification and distribution.

Benefits Stanford Alp of open source LLMs include:

1. **Trans. However,parency and Flexibility**: Open sources. It it is notaca in over source LLM was developed tos provide transparency 90% regarding their operation, architecture, training data, fill a gap and methodologies. of cases.

 This transparency allows enterprises to have in large languageThese models represent models (LL more trust in the pinnacle of the model,Ms), which current LLM assists with audits are neural networks, and ensures technology and are ethical and legal without its challenges that can respond shaping the future compliance. It to a wide, and further variety of human also allows enterprises queries based on research is needed of AI.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:35.784 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.456 | Max budget: $10.000 | Current cost: $0.128, prompt_tokens: 3513, completion_tokens: 384
 to have full control over their to optimize the data, reducing retrieval and generation processes.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:36.237 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.547 | Max budget: $10.000 | Current cost: $0.091, prompt_tokens: 2231, completion_tokens: 404
 their general understanding. However, the risk of data leaks orNot relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:36.940 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.555 | Max budget: $10.000 | Current cost: $0.008, prompt_tokens: 246, completion_tokens: 3
The top open unauthorized access.

-source Large Language LLMs2. ** Models for Cost Savings**:2024 include Open source L often lack the:

1.LMs are **GPT ability to deliver generally less expensive 3. in the long5**: Developed term than proprietary authoritative answers that by OpenAI LLMs, GPT as they do cite sources, 3. not involve licensing which is where5 is a fees. HoweverRetrieval state-of-the, the cost RAG comes-art language model of operating an Augmented Generation that excels LLM does in.

R in producing AI include infrastructure costs (RAG-generated content forAG works by and a significant) is a websites and optimizing initial rollout cost linking generative SEO. It.

3. strategy that addresses is also capable **Added Features AI services to two key issues of generating persuasive and Community Contributions that limit the ad copies and**: Open source analyzing user behavior capabilities of Large LLMs.

2. allow fine-t external resources, **GPTuning and addition-4**: especially ones rich of features that The latest iteration benefit specific uses Language Models ( from OpenAI in the latest. They also, GPT benefit from communityLLMs)-4, technical details. contributions, multiple is trained on service providers, such as L a trillion parameters and possibly internal and is capable teams for updateslama 2 It gives models 13b of processing images, development, sources they can cite, like footnotes in and videos in maintenance, and chat: outdated a research paper addition to text support.

Open. It is source LLM, so users training data and can check any ideal for dynamic claims. Thiss can be content creation, used in a targeted advertising, wide range of and personalized user projects, including the tendency to experiences.

3 text generation,. **B code generation,ARD**: Developed virtual tutoring, make false but by Google AI content summarization, AI-driven, BARD plausible-sounding builds trust and is trained on chatbots, a massive dataset language translation, reduces the possibility of text and statements when facts sentiment analysis, of the model code. It and content filtering can produce text and moderation. aren’t available, translate multiple They are used languages, craft by a variety code, generate of organizations, varied content, including IBM, and provide informative NASA, publishers answers to questions, journalists, making a wrong.

4. (a phenomenon healthcare organizations, **Lla and financial institutions guess, aMA**: An.

Despite their phenomenon sometimes called open-source large benefits, open language model developed source LLM known as halluc hallucination. by Meta AIs also comeination).

 with risks., LlaMA is designed These include theLLMs are to be versatile possibility of the and powerful for LLM generating tasks including query false information or resolution, natural "halluc powerful generative language comprehension,inations", bias AI technologies, and reading comprehension in the data. It is The technique can sources, issues particularly useful for with consent in educational applications.

 data gathering, also help models but their knowledge5. ** and security problems clear up ambiguity has gaps based in a userFalcon**: on their training such as data query.

R data. ForAG can be Developed by the leaks and malicious implemented with as Technology Innovation Institute use by cyber, Falcon iscriminals.

 few as five an open-sourceSeveral open source instance, if language model that LLMs excels in are available for lines of code handling numerous languages use, including an LLM, making it and dialects is asked to LLaMa faster and less. It is 2 by write something about particularly useful for expensive than re Meta AI, multilingual websites Vicuna, a recent trend and improving cross Alpaca-cultural communication.

 or event,, Bloom by6. ** BigScience,training a modelCohere Falcon LLM it won't**: Developed by with additional datasets from Technology Innovation a Canadian startup Institute (T. It also, CohereII), M is trained onPT-7 have any idea a diverse andB and M inclusive dataset.PT-30 allows users to It is effectiveB from M for team collaborationosaicML,, streamlining FLAN-T what the user is talking about, and the content creation, hot-swap responses will be mixed at best new sources on5 by Google the fly. and personalizing AI, Star content.

7 and problematic at worst. ThisCoder from H With RAG. **Paugging Face, is because LLM**: Developed RedPajLM training data by Google AIama-INC tends to be, PaLMITE, C is designed witherebras-G privacy and data out-of-datePT from C, users can security in mind. For exampleerebras,. It is and StableLM essentially have conversations ideal for sensitive from Stability AI., as ofWarning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:24:55.809 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.669 | Max budget: $10.000 | Current cost: $0.114, prompt_tokens: 2830, completion_tokens: 481
 projects, such with data repositories as building secure writing, Chat, opening up eCommerce websites andGPT's platforms that deal knowledge of the with sensitive user world ends at information.

8 new kinds of. **Cla January 2022.

ude v1 experiences. ThisRAG addresses means the applications for RAG these issues by**: Developed by could be multiple American AI startup times the number Anthropic, pairing information retrieval Claude v1 of available datasets with a set is a versatile of carefully designed.

The process AI assistant that simplifies website of RAG involves the AI creation, management, and optimization.

Choosing the model sending the system prompts to right LLM depends on various user's query factors including hosting to another model anchor LLM integration, performance that converts its on precise and capabilities, into a numeric, up-to scalability, and cost and affordability-date, and. The best pertinent information retrieved LLM for from an external format so machines knowledge store. a website will depend on the This makes it budget, needs can read it, and type possible to create. This numeric of the website.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:25:06.213 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.870 | Max budget: $10.000 | Current cost: $0.202, prompt_tokens: 5853, completion_tokens: 433
 version of the domain-specific applications query, sometimes called an embedding or a vector, is then compared to vectors in a machine-readable index of an available knowledge base. When that require a deep and evolving understanding of facts, despite LLM training data remaining static.

For instance, if you ask an LLM, "What a match or is RAG?", you mightNot relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:25:07.162 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.888 | Max budget: $10.000 | Current cost: $0.018, prompt_tokens: 595, completion_tokens: 3
 multiple matches are found, the related data is get very different responses depending on retrieved, converted to human-readable whether or notNot relevant.Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
2024-01-04 13:25:08.567 | INFO     | metagpt.utils.cost_manager:update_cost:48 - Total running cost: $0.900 | Max budget: $10.000 | Current cost: $0.012, prompt_tokens: 380, completion_tokens: 3
 words, and the LLM passed back to itself uses R the LLM. The LLM then combinesAG. An the retrieved words LLM using and its own response to the query into a final answer it RAG would provide a detailed presents to the and accurate response user, potentially about RAG citing sources the being a method embedding model found introduced by Meta. In the AI researchers that background, the combines an information retrieval component with a text generator model to address embedding model continuously creates and updates knowledge-intensive tasks machine-readable indices. On the, sometimes called other hand, vector databases, for new and an LLM updated knowledge bases as they become without RAG might provide several unrelated meanings of the acronym "RAG".

2024-01-04 13:25:18.725 | WARNING  | metagpt.utils.common:wrapper:506 - There is a exception in role's execution, in order to resume, we delete the newest role communication message in the role's memory.
2024-01-04 13:25:18.978 | ERROR    | metagpt.utils.common:wrapper:488 - Exception occurs, start to serialize the project, exp:
Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\streams\tls.py", line 140, in _call_sslobject_method
    result = func(*args)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\ssl.py", line 888, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2621)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\_core\_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_backends\anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
asyncio.exceptions.CancelledError: Cancelled by cancel scope 208d94db370

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_exceptions.py", line 10, in map_exceptions
    yield
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_backends\anyio.py", line 36, in read
    return b""
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 67, in map_httpcore_exceptions
    yield
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 252, in __aiter__
    async for part in self._httpcore_stream:
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\utils\common.py", line 497, in wrapper
    return await func(self, *args, **kwargs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\role.py", line 482, in run
    rsp = await self.react()
httpx.ReadTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\utils\common.py", line 483, in wrapper
    result = await func(self, *args, **kwargs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\team.py", line 133, in run
    await self.env.run()
Exception: Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\streams\tls.py", line 140, in _call_sslobject_method
    result = func(*args)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\ssl.py", line 888, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2621)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\_core\_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_backends\anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\streams\tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\streams\tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\_backends\_asyncio.py", line 1123, in receive
    await self._protocol.read_event.wait()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\asyncio\locks.py", line 226, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 208d94db370

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_exceptions.py", line 10, in map_exceptions
    yield
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_backends\anyio.py", line 36, in read
    return b""
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\anyio\_core\_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 67, in map_httpcore_exceptions
    yield
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 252, in __aiter__
    async for part in self._httpcore_stream:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_async\connection_pool.py", line 361, in __aiter__
    async for part in self._stream:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_async\http11.py", line 337, in __aiter__
    raise exc
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_async\http11.py", line 329, in __aiter__
    async for chunk in self._connection._receive_response_body(**kwargs):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_async\http11.py", line 198, in _receive_response_body
    event = await self._receive_event(timeout=timeout)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_async\http11.py", line 212, in _receive_event
    data = await self._network_stream.read(
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_backends\anyio.py", line 36, in read
    return b""
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpcore\_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\utils\common.py", line 497, in wrapper
    return await func(self, *args, **kwargs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\role.py", line 482, in run
    rsp = await self.react()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\researcher.py", line 105, in react
    msg = await super().react()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\role.py", line 452, in react
    rsp = await self._act_by_order()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\role.py", line 439, in _act_by_order
    rsp = await self._act()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\researcher.py", line 74, in _act
    summaries = await asyncio.gather(*todos)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\actions\research.py", line 227, in run
    summary = await self._aask(prompt, [system_text])
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\actions\action.py", line 67, in _aask
    return await self.llm.aask(prompt, system_msgs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\provider\base_llm.py", line 50, in aask
    rsp = await self.acompletion_text(message, stream=stream, timeout=timeout)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\tenacity\_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\tenacity\_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\tenacity\__init__.py", line 314, in iter
    return fut.result()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\concurrent\futures\_base.py", line 433, in result
    return self.__get_result()
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\tenacity\_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\provider\openai_api.py", line 134, in acompletion_text
    async for i in resp:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\provider\openai_api.py", line 94, in _achat_completion_stream
    async for chunk in response:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\openai\_streaming.py", line 116, in __aiter__
    async for item in self._iterator:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\openai\_streaming.py", line 129, in __stream__
    async for sse in iterator:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\openai\_streaming.py", line 120, in _iter_events
    async for sse in self._decoder.aiter(self.response.aiter_lines()):
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\openai\_streaming.py", line 231, in aiter
    async for line in iterator:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_models.py", line 967, in aiter_lines
    async for text in self.aiter_text():
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_models.py", line 954, in aiter_text
    async for byte_content in self.aiter_bytes():
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_models.py", line 933, in aiter_bytes
    async for raw_bytes in self.aiter_raw():
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_models.py", line 991, in aiter_raw
    async for raw_stream_bytes in self.stream:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_client.py", line 147, in __aiter__
    async for chunk in self._stream:
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 253, in __aiter__
    yield part
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\httpx\_transports\default.py", line 84, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\pydantic\functional_serializers.py:272: UserWarning: Pydantic serializer warnings:
  Expected `str` but got `dict` - serialized value may not be as expected
  lambda x, h: h(x), schema=core_schema.any_schema()
In cases involving domain-specific knowledge, RAG can drastically improve the accuracy of an LLM's responsesTraceback (most recent call last):

  File "C:\Users\DL_User\Documents\MetaGPT\MetaGPT-0.6.0\MetaGPT-0.6.0\metagpt\startup.py", line 81, in <module>
    app()

  File "C:\Users\DL_User\Documents\MetaGPT\MetaGPT-0.6.0\MetaGPT-0.6.0\metagpt\startup.py", line 77, in startup
    asyncio.run(company.run(n_round=n_round))

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\asyncio\base_events.py", line 642, in run_until_complete
    return future.result()

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\utils\common.py", line 489, in wrapper
    self.serialize()  # Team.serialize

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\team.py", line 57, in serialize
    self.env.serialize(stg_path.joinpath("environment"))  # save environment alone

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\environment.py", line 56, in serialize
    role.serialize(stg_path=stg_path.joinpath(f"roles/{role.__class__.__name__}_{role.name}"))

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\roles\role.py", line 190, in serialize
    write_json_file(role_info_path, role_info)

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\site-packages\metagpt\utils\common.py", line 461, in write_json_file
    json.dump(data, fout, ensure_ascii=False, indent=4, default=to_jsonable_python)

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\__init__.py", line 179, in dump
    for chunk in iterable:

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\encoder.py", line 325, in _iterencode_list
    yield from chunks

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks

  File "C:\Users\DL_User\anaconda3\envs\metagpt\lib\json\encoder.py", line 438, in _iterencode
    o = _default(o)

pydantic_core._pydantic_core.PydanticSerializationError: Unable to serialize unknown type: <class 'metagpt.tools.search_engine.SearchEngine'>

@nickjtch I have not been able to solve this no.

nickjtch commented 7 months ago

@AidanShipperley I added exception handling and sleep for 1 second. It worked for me..

AidanShipperley commented 7 months ago

@AidanShipperley I added exception handling and sleep for 1 second. It worked for me..

Would you be able to provide a little more detail? Exception handling around the app() call for SSLWantReadError or for the serialization error? Or somewhere else?

nickjtch commented 7 months ago

Replace the _act method inside researcher.py with the following:

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: to do {self.rc.todo}({self.rc.todo.name})")
        todo = self.rc.todo
        msg = self.rc.memory.get(k=1)[0]
        if isinstance(msg.instruct_content, Report):
            instruct_content = msg.instruct_content
            topic = instruct_content.topic
        else:
            topic = msg.content

        research_system_text = self.research_system_text(topic, todo)
        try:
            if isinstance(todo, CollectLinks):
                links = await self.attempt_with_retry(todo.run, topic, 4, 4)
                ret = Message(
                    content="", instruct_content=Report(topic=topic, links=links), role=self.profile, cause_by=todo
                )
            elif isinstance(todo, WebBrowseAndSummarize):
                links = instruct_content.links
                todos = [self.attempt_with_retry(todo.run, *url, query=query, system_text=self.research_system_text(topic, todo)) 
                            for (query, url) in links.items()]
                summaries = await asyncio.gather(*todos)
                summaries = list((url, summary) for i in summaries for (url, summary) in i.items() if summary)
                ret = Message(
                    content="", instruct_content=Report(topic=topic, summaries=summaries), role=self.profile, cause_by=todo
                )
            else:
                summaries = instruct_content.summaries
                summary_text = "\n---\n".join(f"url: {url}\nsummary: {summary}" for (url, summary) in summaries)
                content = await self.attempt_with_retry(todo.run, topic, summary_text, system_text=research_system_text)
                ret = Message(
                    content="",
                    instruct_content=Report(topic=topic, content=content),
                    role=self.profile,
                    cause_by=self.rc.todo,
                )
        except Exception as e:
            logger.error(f"An error occurred during {todo.name}: {e}")
            # Handle the error as appropriate for your application
            # For example, you could return a message indicating the failure
            return Message(content=f"Failed to complete {todo.name} due to network error.", role=self.profile, cause_by=todo)

        self.rc.memory.add(ret)
        return ret

    async def attempt_with_retry(self, func, *args, max_attempts=3, **kwargs):
        for attempt in range(max_attempts):
            try:
                return await func(*args, **kwargs)
            except RemoteProtocolError as e:
                logger.warning(f"Attempt {attempt + 1} failed for {func.__name__}: {e}")
                if attempt == max_attempts - 1:
                    raise  # Reraise the exception on the last attempt
                await asyncio.sleep(1)  # Simple backoff strategy; adjust as needed

I didn't stress test this, but worked for me so far.

archliu321 commented 5 months ago

faced the same issue.......

but I just want to serialize role with a search engine.

archliu321 commented 5 months ago

@shenchucheng Could you please take a look at this issue. THANKS a lot

geekan / MetaGPT

Researcher Role Cannot Serialize SearchEngine After Exception #687