chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM, Qwen 与 Llama 等)基于 Langchain 与 ChatGLM 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
29.88k stars 5.23k forks source link

[BUG] 在agent中的tool里增加了对特定网页的request请求但是不能正确获取网页内容/ Concise description of the issue #3226

Closed saltedfish2 closed 4 months ago

saltedfish2 commented 4 months ago

问题描述 / Problem Description 用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.

在自定义agent中新增了一个对特定网页(携程网)的requests请求获取网页的功能,在本地的main函数里测试的时候可以正确获取网页,但是使用agent进行调用时没有正确网页返回,返回的是404网页。我查看了BING的API访问的实现方式,Langchain中也是使用requests库来实现的,好像没有另外做设置。不清楚是否与FASTAPI的网络相关设置有关。 关于调试我做了以下分析: 1、使用fastapi.concurrency中的 run_in_threadpool方式进行请求,main调用可以获得网页,agent调用不能获取。 2、输出了请求体内容,main调用与agent调用得到的请求体内容完全一致。

复现问题的步骤 / Steps to Reproduce

  1. 执行 '...' / Run '...' python startup.py -a --lite
  2. 点击 '...' / Click '...' 选择对话自定义agent问答
  3. 滚动到 '...' / Scroll to '...' 进行正常提问,要求调用新增的tool工具
  4. 问题出现 / Problem occurs 在tool成功调用后,对特定的网址接口进行request请求,请求只有404网页返回。

预期的结果 / Expected Result 描述应该出现的结果 / Describe the expected result. 应该正常输出请求的网页内容

实际结果 / Actual Result 描述实际发生的结果 / Describe the actual result. 实际返回404网页 <!doctype html>携程旅行网

环境信息 / Environment Information

附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

dosubot[bot] commented 4 months ago

Hello @saltedfish2! 👋 I'm Dosu, a bot here to lend a hand with bugs, answer your questions, and guide you on contributing while we await a human maintainer. I'm diving into your issue and will get back to you with a full answer shortly. Stay tuned! 🤖

saltedfish2 commented 4 months ago

Hello @saltedfish2! 👋 I'm Dosu, a bot here to lend a hand with bugs, answer your questions, and guide you on contributing while we await a human maintainer. I'm diving into your issue and will get back to you with a full answer shortly. Stay tuned! 🤖

请问什么时候才能回复我的问题呢?

dosubot[bot] commented 4 months ago

Based on the information you've provided and the context from the Langchain-Chatchat repository, it seems like you're encountering a 404 error when your custom agent tool makes a request to a specific webpage. This issue is not directly visible or diagnosable from the given context, as the provided code does not directly interact with webpages or handle HTTP requests.

The 404 error typically indicates that the requested resource could not be found on the server. This could be due to a variety of reasons, such as changes in the webpage's availability or URL structure, network issues between the server where the agent is deployed and the webpage's server, or issues with how the request is made (e.g., the URL being accessed, the method of request, or any specific headers or parameters included in the request).

To troubleshoot this issue, you would need to examine the part of the code that constructs and sends the HTTP request to the webpage, including the URL and any relevant headers or parameters. Additionally, checking the webpage's availability and ensuring the URL is correct and accessible from the server where the agent is deployed would be necessary steps.

I found a similar closed issue in the Langchain-Chatchat repository: POST /chat/knowledge_base_chat HTTP/1.1" 404 Not Found. This issue was last updated on November 13, 2023, and it was referenced by two other issues: 无法访问chachat的api and [BUG] 未找到知识库 samples. These issues might provide some insight into your problem.

Unfortunately, without more information about how the request to the specific webpage (Ctrip website) was added in the custom agent tool in Langchain-ChatGLM, it's difficult to provide a more specific solution. If you could provide more details about this part of the code, it would be helpful in diagnosing and resolving the issue.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
saltedfish2 commented 4 months ago

@dosu-bot 以下是我发送请求的代码

def flights_query(dcity, acity, date, output_file):

    print("在访问接口.......")     print("在得到城市代码.......")

    dcityCode = str(cityCodes.get(dcity))

    acityCode = str(cityCodes.get(acity))

    print("得到的城市代码为:", dcityCode,acityCode)

    headers = {

        "User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0",

        "Content-Type": "application/json",

        "referer": "https://flights.ctrip.com/itinerary/oneway/"+dcityCode+"-"+acityCode+"?date="+date

    }

    url = 'http://flights.ctrip.com/itinerary/api/12808/products'

    request_payload = {"flightWay": "Oneway",

                       "classType": "ALL",

                       "hasChild": 'false',

                       "hasBaby": 'false',

                       "searchIndex": 1,

                       "airportParams": [

                           {"dcity": dcityCode, "acity": acityCode, "dcityname": dcity, "acityname": acity,

                            "date": date}]}

    print("开始请求.....\n")

    print("请求参数:", request_payload)

    print("请求时环境变量.....\n")

    import os

    for key, value in os.environ.items():

        print(f"{key}: {value}")

    try:

        response = requests.post(url, data=json.dumps(

            request_payload), headers=headers).text

    except Exception as e:

        print("请求时出错:", e )

        return e

    # print("\n\n得到的回复:", response)

    routeList = None

    try:

        routeList = json.loads(response).get('data').get('routeList')

    except Exception as e:

        print("jsonload出错:\t", e)

        raise e

    route_filter = []

    if routeList != None:

        for route in routeList:

            if len(route.get('legs')) == 1:

                route_filter.append(route)

    print("route_filter长度:",(len(route_filter)),"\n")

    return json.dumps(route_filter, ensure_ascii=False, indent=4)

dosubot[bot] commented 4 months ago

Based on the information you've provided, it seems like the issue might be related to the way the HTTP request is being handled when it's made within the custom agent tool. Here are a few potential reasons and solutions:

  1. User-Agent Header: Some websites block requests from non-standard user agents to prevent scraping or automated access. If your custom agent tool doesn't set a "User-Agent" header that mimics a standard web browser, the Ctrip website might be rejecting your request. You can try changing the User-Agent to a more common one, like Chrome or Safari.

  2. Cookies and Session State: The Ctrip website might require cookies or a specific session state that your custom agent tool isn't handling correctly. When you test the request in your main function locally, your HTTP client (like a web browser or a tool like Postman) might automatically handle cookies and sessions for you, but your custom agent tool might not. You can try using a session object in requests which will persist certain parameters across requests.

  3. IP Address Block: If you're making many requests to the Ctrip website from the same IP address, especially in a short period, the website might temporarily block your IP address, resulting in a 404 error. This would be more likely if the request works from a different network or after some time has passed. You can try adding a delay between your requests or use a proxy to change your IP address.

  4. Incorrect URL or Parameters: Ensure that the URL and any parameters you're using in your custom agent tool match exactly what works in your local main function. A slight difference in the URL path or query parameters could result in a 404 error. Double-check the URL and parameters in your code.

  5. HTTP vs. HTTPS: Make sure you're using the correct protocol (HTTP or HTTPS) in your request URL. Some websites redirect HTTP requests to HTTPS or vice versa, and this redirection might not be handled correctly by your custom agent tool. You can try changing the protocol in your URL.

If none of these solutions work, it would be helpful to have more information about the error. Specifically, the exact error message and stack trace would be useful. Additionally, if you could provide the code for the custom agent tool, it would be easier to diagnose the problem.

For more information on how to handle HTTP requests in Python, you can refer to the requests library documentation. For more information on how to handle cookies and sessions, you can refer to the requests library user guide.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
zRzRzRzRzRzRzR commented 4 months ago

那应该对内容以str的形式返回给大模型