aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.07k stars 2.01k forks source link

[BUG] Can't get responses content (multidict seems like doesnt work) #7233

Closed Evil0ctal closed 1 year ago

Evil0ctal commented 1 year ago

Describe the bug

The following code will reproduce the problem, first I have two variables, url and headers.

import aiohttp
import requests

url  = "https://api16-core-useast5.us.tiktokv.com/aweme/v1/aweme/post/?source=0&user_avatar_shrink=96_96&video_cover_shrink=248_330&max_cursor=0&sec_user_id=MS4wLjABAAAArFyeo4ABoR3n04XDlNWjexQXcwbhLbnAxSYWgphk0uFtnYeqM9kD_uThGw3si-QX&count=20&locate_item_id=7202594866214309162&iid=7208312964359833387&device_id=7170857108601833002&ac=wifi&channel=googleplay&aid=1233&app_name=musical_ly&version_code=260802&version_name=26.8.2&device_platform=android&ab_version=26.8.2&ssmix=a&device_type=SM-G996U1&device_brand=samsung&language=zh-hant&os_api=28&os_version=9&openudid=6c42d18f9012249b&manifest_version_code=2022608020&dpi=450&update_version_code=2022608020&_rticket=1678316270464&current_region=US&app_type=normal&mcc_mnc=310260&timezone_name=Asia%2FShanghai&carrier_region_v2=310&residence=US&app_language=zh-Hant&carrier_region=US&ac2=wifi5g&uoo=1&op_region=US&timezone_offset=28800&build_number=26.8.2&host_abi=arm64-v8a&locale=zh-Hant-TW&region=TW&ts=1678316270&content_language=en%2C&cdid=6e3e02dd-c362-4f14-8ca1-f014a8a792e9"

headers = {'Cookie': 'install_id=7207567550891460398', 'passport-sdk-version': '19', 'sdk-version': '2', 'x-ss-req-ticket': '1678342747102', 'x-vc-bdturing-sdk-version': '2.2.1.i18n', 'x-tt-dm-status': 'login=1;ct=1;rt=6', 'x-tt-store-idc': 'alisg', 'x-tt-store-region': 'am', 'x-tt-store-region-src': 'did', 'Accept-Encoding': 'gzip, deflate, br', 'user-agent': 'com.zhiliaoapp.musically/2022608020 (Linux; U; Android 9; en; SM-G996U1; Build/N2G48H;tt-ok/3.10.0.2)', 'X-Ladon': 'oYXFWhrfJggcYvzwUy0lt60dLipq1aJJ6+0OH+TEBF/shGCS', 'X-khronos': '1678342748', 'X-Argus': 'Rw77CHFYhC6/ZTVfbq+2xh8rlL1VT/e+CHkY20KPReDsLy+IfIGyh8Uf/QGwjw9t/8qRkHx7AvEGA7+flS4btCDwhq0kUiEs3uiYH0TJWctO+lnvks3laWvbR6gwQHl3RcdfPe39REvc3SsJd1Mv8dWQg+rACDvPobNhIzcfYG/Co9utjfxwwJTwmhPBdyHOoZ4kFcyj9YKbia7E/tymO8f4nIi7E0wl48UZV+bAqWQR9RrpGV2fXoGP3LkTGmPzoeTezH2GgPt90WiIArxJqWfz', 'X-Gorgon': '040420920000e71e7722d2a7a0bb1580a67542a4ef56084c5b06'}

# Using requests

response = requests.get(url, headers=headers)
print("response_by_requests: ", response.text[:100])

# Using AioHTTP

# async with aiohttp.ClientSession(headers=headers) as session:
#     async with session.get(url) as response:
#         print('url: ', response.url)
#         print('headers: ', dict(session.headers))
#         print("response: ", await response.text())

async with aiohttp.ClientSession() as session:
    async with session.get(url, headers=headers) as response:
        print("response_by_aiohttp: ", (await response.text())[:100])
        print('status: ', response.status)      

Output:

response_by_requests:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_
response_by_aiohttp:  
status:  200

To Reproduce

See code above.

Expected behavior

Not sure, but looks like the headers didnt work.

Logs/tracebacks

See code above.

Python Version

$ python --version
tried Python 3.9 and 3.11

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.8.3

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.4

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.8.2

OS

Windows 11

Related component

Client

Additional context

No response

Code of Conduct

bizzyvinci commented 1 year ago

Hi @Evil0ctal, requests and aiohttp behaved the same way in this notebook. It has editor access, so you can make changes to it. Thanks.

Evil0ctal commented 1 year ago

Ok, I found that using requests and asynchronous httpx can get data normally, only aiohttp can't get data.

The failure to obtain data when running on colab may be due to IP being blocked by TikTok or other reasons (need more tests)

I will update the complete code for debugging, including requests, aiohttp, httpx, and their output.

Dreamsorcerer commented 1 year ago

Probably the URL is being escaped. You can either use a pre-encoded URL, or pass the parameters with the params argument. https://github.com/aio-libs/aiohttp/issues/4307#issuecomment-548699254

Evil0ctal commented 1 year ago

Now things are getting weirder. I can't get the data even if I put the same parameters in Postman, but the requests library and its asynchronous package httpx can still get the data. Is there any reasonable explanation for this? Could it be magic :)

Evil0ctal commented 1 year ago
# The code below is a reproduce my issue of requests, httpx and aiohttp.
# reference: https://github.com/aio-libs/aiohttp/issues/7233

import aiohttp
import requests
import httpx

# Using requests
def fetch_requests(__url, __headers):
    response = requests.get(__url, headers=__headers)
    print('status_by_requests: ', response.status_code)
    print("response_by_requests: ", response.text[:100])
    return response

# Using httpx
async def fetch_httpx(__url, __headers):
    async with httpx.AsyncClient() as client:
        response = await client.get(__url, headers=__headers)
        print('status_by_httpx: ', response.status_code)
        print("response_by_httpx: ", response.text[:100])
        return response

# Using AioHTTP
async def fetch_aiohttp(__url, __headers):
    async with aiohttp.ClientSession() as session:
        async with session.get(__url, headers=__headers) as response:
            print('status_by_aiohttp: ', response.status)
            print("response_by_aiohttp: ", (await response.text())[:100])

if __name__ == '__main__':
    import asyncio

    url = "https://api16-normal-useast5.us.tiktokv.com/aweme/v1/aweme/post/?source=0&user_avatar_shrink=96_96&video_cover_shrink=248_330&max_cursor=0&sec_user_id=MS4wLjABAAAArFyeo4ABoR3n04XDlNWjexQXcwbhLbnAxSYWgphk0uFtnYeqM9kD_uThGw3si-QX&count=10&iid=7208312964359833387&device_id=7170857108601833002&ac=wifi&channel=googleplay&aid=1233&app_name=musical_ly&version_code=260802&version_name=26.8.2&device_platform=android&ab_version=26.8.2&ssmix=a&device_type=SM-G996U1&device_brand=samsung&language=en&os_api=28&os_version=9&openudid=6c42d18f9012249b&manifest_version_code=2022608020&resolution=1080*2265&dpi=450&update_version_code=2022608020&_rticket=1678316270464&current_region=US&app_type=normal&mcc_mnc=310260&timezone_name=Asia%2FShanghai&carrier_region_v2=310&residence=US&app_language=en&carrier_region=US&ac2=wifi5g&uoo=1&op_region=US&timezone_offset=28800&build_number=26.8.2&host_abi=arm64-v8a&locale=en-US&region=US&ts=1678437555&content_language=en%2C&cdid=6e3e02dd-c362-4f14-8ca1-f014a8a792e9"

    headers = {'Cookie': 'install_id=7207567550891460398', 'passport-sdk-version': '19', 'sdk-version': '2', 'x-ss-req-ticket': '1678437555023', 'x-vc-bdturing-sdk-version': '2.2.1.i18n', 'x-tt-dm-status': 'login=1;ct=1;rt=6', 'x-tt-store-idc': 'alisg', 'x-tt-store-region': 'am', 'x-tt-store-region-src': 'did', 'user-agent': 'com.zhiliaoapp.musically/2022608020 (Linux; U; Android 9; en; SM-G996U1; Build/N2G48H;tt-ok/3.10.0.2)', 'content-type': 'application/x-www-form-urlencoded; charset=UTF-8', 'X-Ladon': 'GfTYUm2SZ061+9gCMCGHlydrZcoSaquxTIjP3lXH/Jr1noHQ', 'X-khronos': '1678437555', 'X-Argus': 'h6Ob+2vC1+jphF+G7uWJI+BnBTy5PR3qULM9jsQexkNNMhqzM5l0FNIIArGDfgmVmWD4XkN0iEvxDbX/XbWCt7T47TVVQQwNcFSQ/ndnqxIeeMC6XFLcUFptQVHhnP11aOVCG6BUEm2LAmGOcHDfD8J/tYukuuat7v05TJZdA0kXpdqP70STdfjPI+knITH1+npPfSpJB3Y8FDzhZ8Yupa12yNzAVOBUj3jQdMruj/aoKuPw+E0HGqIRRBg7gPVM41gAkZExEOx1kJ7runVoiXr/', 'X-Gorgon': '0404209200003df64784d2a7a0bb1580a67542a4ef56c8293a22'}

    # Using requests
    fetch_requests(url, headers)

    # Using httpx
    asyncio.run(fetch_httpx(url, headers))

    # Using AioHTTP
    asyncio.run(fetch_aiohttp(url, headers))

    """
    Output:
    status_by_requests:  200
    response_by_requests:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_
    status_by_httpx:  200
    response_by_httpx:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_
    status_by_aiohttp:  200
    response_by_aiohttp:  
    """
Evil0ctal commented 1 year ago

@bizzyvinci

Please run the code ASAP,because the hedaer of TikTok APP has an expiration date.

bizzyvinci commented 1 year ago

Yes, I get your expected output locally and on google colab

Evil0ctal commented 1 year ago

Can you try US proxy? TikTok is not available in few countries,I still be able to get the response text.

bizzyvinci commented 1 year ago

The suggested comment above works. Use yarl.URL(url, encoded=True)

import yarl
...
#async with session.get(__url, headers=__headers) as response:
async with session.get(yarl.URL(__url, encoded=True), headers=__headers) as response:
...
bizzyvinci commented 1 year ago

I've updated the notebook to include solution. For more context Asia%2FShanghai was changed to Asia/Shanghai when encoded is not True. I hope the issue is resolved and this can be closed.

Evil0ctal commented 1 year ago

Looks like the issue had been fixed.

Thank you for the help! I will close this issue.

# Using AioHTTP
async def fetch_aiohttp(__url, __headers):
    async with aiohttp.ClientSession() as session:
        async with session.get(yarl.URL(__url, encoded=True), headers=__headers) as response:
            print('status_by_aiohttp: ', response.status)
            print("response_by_aiohttp: ", (await response.text())[:100])
status_by_requests:  200
response_by_requests:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_
status_by_httpx:  200
response_by_httpx:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_
status_by_aiohttp:  200
response_by_aiohttp:  {"aweme_list":[{"anchors":null,"anchors_extras":"","author":{"accept_private_policy":false,"account_