bogdanfinn / tls-client

net/http.Client like HTTP Client with options to select specific client TLS Fingerprints to use for requests.
BSD 4-Clause "Original" or "Old" License
670 stars 133 forks source link

Rubbish body if the server has returned data encoded using gzip #32

Closed sergey-scat closed 11 months ago

sergey-scat commented 1 year ago

In some cases (when the "Accept-Encoding" header is used) rubbish is returned in the response body instead of readable data.

Here is a Python example:

import ctypes
import json
import os.path
from io import BytesIO

SCRIPT_FOLDER = os.path.dirname(__file__)

# load the tls-client shared package for your OS you are currently running your python script (i'm running on mac)
library = ctypes.cdll.LoadLibrary(
    os.path.join(SCRIPT_FOLDER, 'tls-client-windows-64-1.3.8.dll')
)

# extract the exposed request function from the shared package
request = library.request
request.argtypes = [ctypes.c_char_p]
request.restype = ctypes.c_char_p

def main():
    requestPayload = {
        "tlsClientIdentifier": "chrome_107",
        "followRedirects": True,
        "insecureSkipVerify": False,
        "withoutCookieJar": False,
        "withDefaultCookieJar": False,
        "isByteRequest": False,
        "additionalDecode": "",
        "forceHttp1": False,
        "withDebug": False,
        "catchPanics": False,
        "withRandomTLSExtensionOrder": False,
        "session": 0,
        "timeoutSeconds": 30,
        "timeoutMilliseconds": 0,
        "certificatePinningHosts": {},
        "proxyUrl": "",
        "isRotatingProxy": False,
        "headers": {
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
            "accept-encoding": "gzip, deflate, br",
            "accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"
        },
        "headerOrder": [
            "user-agent",
            "accept",
            "accept-encoding",
            "accept-language"
        ],
        "requestUrl": "https://outlook.live.com/owa/?nlp=1",
        "requestMethod": "GET",
        "requestBody": "",
        "requestCookies": []
    }
    response = request(json.dumps(requestPayload).encode('utf-8'))
    response_object = json.load(BytesIO(ctypes.string_at(response)))

    print('Content-Encoding: ', response_object['headers'].get('Content-Encoding'))
    print('Body: ', response_object['body'][:40], '...')

if __name__ ==  '__main__':
    main()

The script above returns the following:

Content-Encoding:  ['gzip']
Body:  �\000\000\000\000\000\000��[o�J�'����v�-�)Y_�0�n�K�" ...

And everything would work fine if you removed "accept-encoding" from the headers:

Content-Encoding:  None
Body:  <!-- Copyright (C) Microsoft Corporation ...
bogdanfinn commented 1 year ago

@sergey-scat maybe this little article gives you some information about what is happening here and why is it happening: https://bogdanfinn.gitbook.io/open-source-oasis/tls-client/response-body-encoding-decoding

Also the article shows an example how to avoid that.

sergey-scat commented 1 year ago

@bogdanfinn Oh, I see, thank you. I tried your example and it automatically decompressed gzip (as I now understand, because it was using HTTP2), so I thought the response was always decompressed according to the value of the Content-Encoding header.

So could you please tell me why there is no automatic decompression when using the HTTP1 protocol?

bogdanfinn commented 1 year ago

@bogdanfinn it was never implemented i think. The thing here is that i forked the fhttp package dependency and only adjusted the parts i needed for this tls client ... it was just implemented like it is back then and never touched.

clouedoc commented 1 year ago

Another way to think about encoding issues is that if the Content-Encoding header is present on a response object, it must be decoded manually.

bogdanfinn commented 1 year ago

@sergey-scat changed the behavior in version 1.4.0

bogdanfinn commented 11 months ago

Closed due to inactivity