aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.04k stars 2k forks source link

The set request parameters were not used #8464

Open JoeanAmier opened 3 months ago

JoeanAmier commented 3 months ago

Describe the bug

My project is a crawler program. The program's lifecycle only creates a ClientSession object, and all requests are initiated using this object. However, some requests may encounter exceptions and return incorrect response codes When I debugged, I used the url and headers separately for code testing, and the response code was normal. I found that the problem was with the ClientSession. When I used a new ClientSession object to initiate a request at the location where the error occurred, the response code was correct. Every time the program uses the ClientSession object to initiate a request, it passes in the url and headers, but I don't know why the response code exception only occurs at that location. Could it be that the ClientSession is contaminated? If I need to resolve this error, I may need to create two ClientSession objects for separate use. Do you have any better suggestions?

To Reproduce

Due to the unique nature of the project, it is not convenient for others to reproduce the bug. I am testing the TikTok download feature of the project and reproducing the bug 100%.

Expected behavior

I took out the URL and headers separately for testing, and the response code was 206, normal. When there was an exception, the response code was 403.

Logs/tracebacks

I took out the URL and headers separately for testing, and the response code was 206, normal. When there was an exception, the response code was 403.

Python Version

$ python --version
Python 3.12.3

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.5
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: C:\Users\youyq\PycharmProjects\general_venv\Lib\site-packages
Requires: aiosignal, attrs, frozenlist, multidict, yarl
Required-by: pythonmonkey

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: C:\Users\youyq\PycharmProjects\general_venv\Lib\site-packages
Requires:
Required-by: aiohttp, grpclib, yarl

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.3
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: C:\Users\youyq\PycharmProjects\general_venv\Lib\site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Windows 11

Related component

Client

Additional context

This is my project address: https://github.com/JoeanAmier/TikTokDownloader It is still under development

ClientSession

def base_session(
        user_agent=USERAGENT,
        timeout=TIMEOUT,
        headers: dict = None,
) -> ClientSession:
    return ClientSession(
        headers=headers or {"User-Agent": user_agent, },
        timeout=ClientTimeout(connect=timeout),
    )

Code of Conduct

Dreamsorcerer commented 3 months ago

Due to the unique nature of the project, it is not convenient for others to reproduce the bug. I am testing the TikTok download feature of the project and reproducing the bug 100%.

Could you atleast put a snippet of the code you used along with comments explaining what happened at different points? I'm rather struggling to understand your issue from the description.

JoeanAmier commented 3 months ago
    @PrivateRetry.retry
    async def request_file(
            self,
            url: str,
            temp: Path,
            actual: Path,
            show: str,
            id_: str,
            count: SimpleNamespace,
            progress: Progress,
            headers: dict = None,
            tiktok=False,
            unknown_size=False,
            semaphore: Semaphore = None,
    ) -> bool:
        async with semaphore or self.semaphore:
            try:
                async with self.session.get(
                        url,
                        proxy=self.proxy_tiktok if tiktok else self.proxy,
                        headers=self.__adapter_headers(headers, tiktok, ), ) as response:
                    if not (
                            content := int(
                                response.headers.get(
                                    'content-length',
                                    0))) and not unknown_size:  # 响应内容大小判断
                        self.log.warning(f"{url} 响应内容为空")
                        return False
                    if response.status > 400:  # 响应码判断
                        self.log.warning(
                            f"{response.url} 响应码异常: {response.status}")
                        return False
                    elif all((self.max_size, content, content > self.max_size)):  # 文件下载跳过判断
                        self.log.info(f"{show} 文件大小超出限制,跳过下载")
                        return True
                    return await self.download_file(
                        temp,
                        actual,
                        show,
                        id_,
                        response,
                        content,
                        count,
                        progress)
            except ClientError as e:
                self.log.warning(f"网络异常: {e}")
                return False

The ClientSession in this location has encountered an exception. I used a new ClientSession object here to restore it to normal. My friend said that the parameter may not have been successfully passed. The code here is to download a file and requires a cookie. The normal response code is 206, and the incorrect cookie response code is 403. It is suspected that the headers were not successfully passed here.

Dreamsorcerer commented 3 months ago

I'm still not clear what the issue is.

I used a new ClientSession object here to restore it to normal.

The code you shared does not create a new ClientSession, it simply returns False after an exception.

I'm rather unclear what you want us to do. If the headers are wrong, that's not something we can help with...

JoeanAmier commented 3 months ago

Replacing aiohttp with https or requests can solve the problem.

Dreamsorcerer commented 3 months ago

Are you putting parameters in the URL? The most common difference between those libraries is that URLs are escaped by default. See (in particular, the note): https://docs.aiohttp.org/en/stable/client_quickstart.html#passing-parameters-in-urls

JoeanAmier commented 3 months ago

I used aiohttp in my project, and at the location where an exception occurred, I passed in the URL and headers parameters. I copied the URL and headers parameters and tested them using aiohttp code. The test passed, but I suspect that parameter passing failed. My friend has also experienced parameter passing failures, which is not a coding issue. If the headers parameter is not passed, the running result will be the same as this exception result.

Dreamsorcerer commented 3 months ago

I don't think there's any difference between these libraries regarding passing headers. You're either passing them or you're not. aiohttp isn't going to lose headers.

JoeanAmier commented 3 months ago

Testing the URL and headers parameters using aiohttp, httpx, and requests separately is normal, and aiohttp only experiences exceptions at a specific location in the project.

JoeanAmier commented 3 months ago

I recorded a video, including the location and results of the anomalies, as well as the results of individual tests.

https://youtu.be/v7b8NiqbrrY

If you think this anomaly is not related to aiohttp, I will delete the video.

JoeanAmier commented 3 months ago

If it's a problem with my code, I should get a response code 403 when testing using URL and headers, which is consistent with the response code at the exception location. However, the response code I tested was 206, indicating that it's not an exception in my code.

Dreamsorcerer commented 3 months ago

The URL in your video is a string with query parameters. Therefore, you probably need to pre-encode (or pass them using params) as I mentioned in the previous comment: https://github.com/aio-libs/aiohttp/issues/8464#issuecomment-2178577282

The difference in behaviour could be the proxy? Maybe one proxy is actually decoding the URL before passing it through to the endpoint, while the other passes it through unchanged.

JoeanAmier commented 3 months ago

The URL and headers are both directly copied for testing, and the proxy is also set to http://127.0.0.1:10809 If it is a coding issue, the copied test results should be consistent with the abnormal results.

JoeanAmier commented 3 months ago

The code at the exception location and the parameters used in the test code are exactly the same, and the results should also be consistent, but in reality, they are not consistent. I don't know why the headers failed to pass, but as an aiohttp developer, you should have a better understanding.

Dreamsorcerer commented 3 months ago

Have you tried passing it as a pre-encoded URL as mentioned twice already? Without your code, I can't give any more suggestions than that..

JoeanAmier commented 3 months ago

I tried many methods, including encoding parameters, but couldn't solve them until I created a new ClientSession or replaced aiohttp with requests or httpx. When I tested, I did not perform any encoding on the URL, but the test results were normal. Isn't it enough to indicate that it's not an encoding issue?

Dreamsorcerer commented 3 months ago

When I tested, I did not perform any encoding on the URL, but the test results were normal. Isn't it enough to indicate that it's not an encoding issue?

Well, if you just use encoded=True then we know for sure that it's not the URL encoding. Unfortunately, without being able to run your code, I have no further ideas. I can't remember anyone else reporting an issue like that which is solved by creating new sessions.. Are you definitely using sessions in httpx/requests as well?