TransformerOptimus / SuperAGI

<⚡️> SuperAGI - A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
https://superagi.com/
MIT License
14.98k stars 1.79k forks source link

WebScraperTool return Error #265

Closed Mingzefei closed 1 year ago

Mingzefei commented 1 year ago

Tool WebScraperTool returned: Error while extracting text from HTML (bs4): 403

I asked the Agent to search information from internet and summarize them into file. The GoogleSearch Tool worked well and return the urls currect. But when the Agent used the 'WebScraperTool' to extract relevant information from these urls' page, I get the Error message above.

Autocop-Agent commented 1 year ago

Can you tell me what was your goal? and if possible can you send the logs too

Mingzefei commented 1 year ago

Sure.

GOALS:

  1. Search and read 10 academic papers on lithium battery capacity prediction on Google Scholar, and summarize them in the workspace in a timely manner.
  2. Investigate lithium battery capacity prediction, and summarize the results in a report saved to the workspace.

And some messages maybe useful

You are SuperAGI an AI assistant to solve complex problems. Your decisions must always be made independently without seeking user assistance.
Play to your strengths as an LLM and pursue simple strategies with no legal complications.
If you have completed all your tasks or reached end state, make sure to use the "finish" tool.

GOALS:
1. 在google学术上搜索并阅读10篇关于锂电池容量预测的学术论文,并及时总结到工作区
2. 调研锂电池容量预测,将结果总结成一份报告保存到工作区

CONSTRAINTS:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Ensure the command and args are as per current plan and reasoning
5. Exclusively use the commands listed in double quotes e.g. "command name"

TOOLS:
1. ThinkingTool: Intelligent problem-solving assistant that comprehends tasks, identifies key variables, and makes efficient decisions, all while providing detailed, self-driven reasoning for its choices., args json schema: {"task_description": {"title": "Task Description", "description": "Task description which needs reasoning.", "type": "string"}}
2. WebScraperTool: Used to scrape website urls and extract text content, args json schema: {"website_url": {"title": "Website Url", "description": "Valid website url without any quotes.", "type": "string"}}
3. GoogleSearch: A tool for performing a Google search and extracting snippets and webpages.Input should be a search query., args json schema: {"query": {"title": "Query", "description": "The search query for Google search.", "type": "string"}}
4. Write File: Writes text to a file, args json schema: {"file_name": {"title": "File Name", "description": "Name of the file to write. Only include the file name. Don't include path.", "type": "string"}, "content": {"title": "Content", "description": "File content to write", "type": "string"}}
5. Read File: Reads the file content in a specified location, args json schema: {"file_name": {"title": "File Name", "description": "Path of the file to read", "type": "string"}}
6. finish: use this to signal that you have finished all your objectives, args: "response": "final response to let people know you have finished your objectives"

PERFORMANCE EVALUATION:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every tool has a cost, so be smart and efficient.
5. Aim to complete tasks in the least number of steps.

I should only respond in JSON format as described below.
Response Format:
{
"thoughts": {
"text": "thought",
"reasoning": "reasoning",
"plan": "- short bulleted\n- list that conveys\n- long-term plan",
"criticism": "constructive self-criticism",
"speak": "thoughts summary to say to user"
},
"tool": {
"name": "tool name/task name",
"description": "tool or task description",
"args": {
"arg name": "value"
}
}
}

Ensure the response can be parsed by Python json.loads.
Tool GoogleSearch returned: There are several academic papers on lithium battery capacity prediction. One paper introduces literature on predicting the residual life of lithium batteries, while another paper discusses the estimation and prediction of the state of health (SOH) of lithium-ion batteries, particularly the remaining useful life (RUL). A third paper presents a lithium battery life prediction model and grey model MDGM(1,1) based on data mining, which showed higher precision and could provide help for the prediction and development of mobile phone battery life. Overall, these papers demonstrate the importance of predicting lithium battery capacity and highlight different methods for doing so.

Links:
- https://journals.sagepub.com/doi/full/10.1177/0144598720911724
- https://onlinelibrary.wiley.com/doi/abs/10.1002/er.5002
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9359824/
Thoughts: I decided to use the WebScraperTool because it can extract text content from websites and save it to a file, which will make it easier for me to analyze and summarize the papers later. I also decided to save the summaries to a separate file to keep the information organized.
Plan: - Use WebScraperTool to extract text content from each academic paper and save it to a file
- Analyze the papers and summarize their findings
- Save the summaries to a separate file
Criticism: I need to make sure that the WebScraperTool is properly configured to extract text content from the academic papers. I should also make sure to save important information to files as soon as possible to avoid losing it.
Tool: WebScraperTool

It seems work well, until

Tool WebScraperTool returned: Error while extracting text from HTML (bs4): 403
Mingzefei commented 1 year ago

Furthermore, even though I deliberately did not select WebScraperTool when creating the agent, it still uses this tool.

Autocop-Agent commented 1 year ago

Right now the webscraper tool is added by default, we are making it optional to add. Also sometimes some websites don't allow bot scrapping. in such cases it returns 403 forbidden error, The flow shouldn't be affected by it.

Mingzefei commented 1 year ago

Yes, I found that the reason is indeed on web scraping. I am trying to make the Agent access and browse these web pages like a human through a browser, although this method may not be elegant. Or do you have any better suggestions?

FrancisVarga commented 1 year ago

I think the webscraping tool is the most important tool. At AutoGTP it works more or less.

neelayan7 commented 1 year ago

We have fixed this issue. Can you check again? @Mingzefei

Mingzefei commented 1 year ago

Yes, it no longer reverts to previous errors. Thanks again!

neelayan7 commented 1 year ago

Resolving this issue. Thanks! Do let us know if you face any other issue.