bogdanfinn / tls-client

net/http.Client like HTTP Client with options to select specific client TLS Fingerprints to use for requests.
BSD 4-Clause "Original" or "Old" License
860 stars 164 forks source link

[Needs Documentation]: Memory leak in python example #121

Closed kastger closed 4 months ago

kastger commented 4 months ago

TLS client version

v1.7.5

System information

NAME="Linux Mint" VERSION="21.2 (Victoria)" ID=linuxmint ID_LIKE="ubuntu debian" PRETTY_NAME="Linux Mint 21.2" VERSION_ID="21.2" VERSION_CODENAME=victoria UBUNTU_CODENAME=jammy

Issue description

The issue was noticed in tls-client wrapper, but it is also present when using compiled lib. The code is taken from example_python add executed in a loop. For this issue I did 50 requests to the same url and tracked memory usage:

__________ 0 _________
Memory usage: 17.0703125 MB (before making request)
Memory usage: 396.18359375 MB (after response)
Memory usage: 727.16796875 MB (after clearing response)
Memory usage: 727.16796875 MB (after destroying session)
__________ 1 _________
Memory usage: 341.15625 MB (before making request)
Memory usage: 438.55859375 MB (after response)
Memory usage: 769.31640625 MB (after clearing response)
Memory usage: 769.31640625 MB (after destroying session)
__________ 2 _________
Memory usage: 383.29296875 MB (before making request)
Memory usage: 494.23046875 MB (after response)
Memory usage: 824.9921875 MB (after clearing response)
Memory usage: 824.9921875 MB (after destroying session)
...
__________ 49 _________
Memory usage: 3040.84375 MB (before making request)
Memory usage: 3130.79296875 MB (after response)
Memory usage: 3461.6796875 MB (after clearing response)
Memory usage: 3461.6796875 MB (after destroying session)

memory-profile

I have tested memory on pure Go and everything seemed to be fine. Is there an issue in the example or is something wrong with how memory is managed in Python in this case?

Steps to reproduce / Code Sample

import ctypes
import psutil
import json

process = psutil.Process()
library = ctypes.cdll.LoadLibrary('./tls-client-linux-ubuntu-amd64-1.7.5.so')

freeMemory = library.freeMemory
freeMemory.argtypes = [ctypes.c_char_p]

destroySession = library.destroySession
destroySession.argtypes = [ctypes.c_char_p]
destroySession.restype = ctypes.c_char_p

destroyAll = library.destroyAll
destroyAll.argtypes = []
destroyAll.restype = ctypes.c_char_p

request = library.request
request.argtypes = [ctypes.c_char_p]
request.restype = ctypes.c_char_p

getCookiesFromSession = library.getCookiesFromSession
getCookiesFromSession.argtypes = [ctypes.c_char_p]
getCookiesFromSession.restype = ctypes.c_char_p

addCookiesToSession = library.addCookiesToSession
addCookiesToSession.argtypes = [ctypes.c_char_p]
addCookiesToSession.restype = ctypes.c_char_p

def memory_usage():
    return process.memory_info().rss / 1024 / 1024

def make_request():
    requestPayload = {
        "tlsClientIdentifier": "chrome112",
        "followRedirects": False,
        "insecureSkipVerify": False,
        "withoutCookieJar": False,
        "withDefaultCookieJar": False,
        "isByteRequest": False,
        "forceHttp1": False,
        "withDebug": False,
        "catchPanics": False,
        "withRandomTLSExtensionOrder": False,
        "timeoutSeconds": 30,
        "timeoutMilliseconds": 0,
        "sessionId": "my-session-id",
        "proxyUrl": "",
        "isRotatingProxy": False,
        "certificatePinningHosts": {},
        "headers": {
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
            "accept-encoding": "gzip, deflate, br",
            "accept-language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"
        },
        "headerOrder": [
            "accept",
            "user-agent",
            "accept-encoding",
            "accept-language"
        ],
        "requestUrl": 'https://www.globus.ch/media/akeneo/2000402933133_MO_1/1693574178?v=gallery&width=500',
        "requestMethod": "GET",
        "requestBody": "",
        "requestCookies": []
    }

    # this is a pointer to the response
    response = request(json.dumps(requestPayload).encode('utf-8'))

    after_response = memory_usage()
    print(f"Memory usage: {after_response} MB (after response)")

    # we dereference the pointer to a byte array
    response_bytes = ctypes.string_at(response)

    # convert our byte array to a string (tls client returns json)
    response_string = response_bytes.decode('utf-8')

    # convert response string to json
    response_object = json.loads(response_string)

    # print out output
    after_response_clear = memory_usage()
    print(f"Memory usage: {after_response_clear} MB (after clearing response)")

    cookiePayload = {
        "sessionId": "my-session-id",
    }

    cookieResponse = getCookiesFromSession(json.dumps(cookiePayload).encode('utf-8'))
    # we dereference the pointer to a byte array
    cookieResponse_bytes = ctypes.string_at(cookieResponse)
    # convert our byte array to a string (tls client returns json)
    cookieResponse_string = cookieResponse_bytes.decode('utf-8')
    # convert response string to json
    cookieResponse_object = json.loads(cookieResponse_string)

    destroySessionPayload = {
        "sessionId": "my-session-id",
    }

    destroySessionResponse = destroySession(json.dumps(destroySessionPayload).encode('utf-8'))
    # we dereference the pointer to a byte array
    destroySessionResponse_bytes = ctypes.string_at(destroySessionResponse)
    # convert our byte array to a string (tls client returns json)
    destroySessionResponse_string = destroySessionResponse_bytes.decode('utf-8')
    # convert response string to json
    destroySessionResponse_object = json.loads(destroySessionResponse_string)
    destroyAll()

    after_session_destroy = memory_usage()
    print(f"Memory usage: {after_session_destroy} MB (after destroying session)")
    return {
        "after_response": after_response,
        "after_response_clear": after_response_clear,
        "after_session_destroy": after_session_destroy,
    }

RANGE = 10
for i in range(RANGE):
    print(f"__________ {i} _________")
    print(f"Memory usage: {memory_usage()} MB (before making request)")
    if i == 0:
        start_data = {"before_request": memory_usage()}
        start_data.update(make_request())
    elif i == RANGE - 1:
        end_data = {"before_request": memory_usage()}
        end_data = make_request()
    else:
        make_request()

print(start_data)
print(end_data)
bogdanfinn commented 4 months ago

@kastger Thank you for taking your time writing the issue and providing such a good PoC.

I link a chapter from the docs where i wrote something about the memory issues: https://bogdanfinn.gitbook.io/open-source-oasis/shared-library/memory-issues

The important part:

By design there is memory allocated for every response coming from the go implementation (shared library) to the invoking application (python, node, etc.). The caller has to free the memory when he is done with handling the response. otherwise the memory will never be freed and you run into memory issues.

So you are already doing good destroying the sessions (which is btw not needed if you reuse sessions). But you forgot to free the actual memory which needs to be allocated for communication between the shared library and the invoking application. There is the special freeMemory() function you basically need to call after every interaction with the shared library in order to free this memory.

kastger commented 4 months ago

Thank you for such quick answer.

I have tested by changing destroyAll() to freeMemory(response_object["id"].encode("utf-8")) in my previous example and it seems to have worked.