ValvePython / dota2

🐸 Python package for interacting with Dota 2 Game Coordinator
http://dota2.readthedocs.io
204 stars 32 forks source link

Graceful shutdown of execution in long-running process #57

Closed DEV-ONI closed 3 years ago

DEV-ONI commented 3 years ago

Thanks for creating such a great package!

I was hoping to ask for insight on some things regarding the package I've experimented with some code that automates some interaction with the Dota client that:

  1. instantiates a SteamClient
  2. attaches event handlers to this instance of the SteamClient
  3. passes the instance to to DotaClient
  4. attaches event handlers to this instance of the DotaClient
  5. calls the login method of the SteamClient instance
  6. performs some event handling through the registered callbacks
  7. call logout when this is finished

As I understand, the logout method of the steam client

  1. disconnects the user from the dota client and steam client
  2. kills the heartbeat and message receiving greenlet loops of the application necessary for communicating with the GC servers

In a long-running process where the above is repeatedly run, would killing the greenlets through logout be the most graceful way of shutting down the greenlets and free the resources?

I saw increasing memory usage through memory profiling despite calling logout and confirming no greenlets are running and wondered what I might to do mitigate this on repeated executions

DEV-ONI commented 3 years ago

I hope this may be of some help.

I ran the above several times, and used tracemalloc to find allocations with each repeated execution.

event emitter allocations were also accumulating until I added remove_listeners on teardown for the dota and steam client. Below is a sample of the trace allocating the most memory per run, with total KiB accumulating after several iterations

I'm not sure if this is what would classify as a memory leak, or am I misunderstanding something? Please let me know what you think.


                    new KiB:  9.03125
                    total KiB: 36.453125
                    new:  161
                    total memory blocks: 646

  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/client/__init__.py", line 155
    CMClient._handle_cm_list(self, msg)  # clear and merge
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 397
    self.cm_servers.merge_list(new_servers)
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 570
    self.mark_good((ip, port))
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 549
    self.list[server_addr].update({'quality': CMServerList.Good, 'timestamp': time()})

                    new KiB: 7.7109375
                    total KiB: 29.0390625
                    new: 47
                    total memory blocks: 177

  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/client/__init__.py", line 524
    eresult = self._pre_login()
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/client/__init__.py", line 443
    if not self.connect():
  File "/home/user/projects/user/user_test_code/test_code/game_manager.py", line 618
    if SteamClient.connect(self, retry=cm_connect_retry):
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/client/__init__.py", line 93
    return CMClient.connect(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 135
    if not self.cm_servers.bootstrap_from_webapi():
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 509
    self.merge_list(map(str_to_tuple, serverlist))
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 570
    self.mark_good((ip, port))
  File "/home/user/.pyenv/versions/myvenv/lib/python3.8/site-packages/steam/core/cm.py", line 549
    self.list[server_addr].update({'quality': CMServerList.Good, 'timestamp': time()})
rossengeorgiev commented 3 years ago

Sounds like you may be registering event listeners over and over again. I've ran processes over the span of months and even a year with stable memory usage. The only issues I've encountered are bugs in gevent. Need to code to reproduce the issue

DEV-ONI commented 3 years ago

Thank you for responding.

I'll attempt to simplify my code to isolate the issue, as it might be too large to submit here. I'll also post the memory graph.

I don't seem to have the problem anymore with event listeners after I removed them before logging out, as stated above, however I believe the tracebacks of my snapshots show that memory allocations are accumulating elsewhere. Correct me if I'm wrong though.

DEV-ONI commented 3 years ago

I wrote a little script to represents what I wish to accomplish, albeit with slightly less functionality for simplicity and to isolate my issue.

I achieved the same behavior with this script, but in reality my implementation is within a simple server.

In this code, I have 5 sets of credentials used to start 5 sessions in succession with 20 repetitions. Each session expires after some amount of time. Attached is a graph generated by the memory_profiler package

Is the increase in memory not necessarily a memory leak? Or is there something incorrect with my approach? I've only benchmarked it for about 20 minutes, nowhere near as long as several months, perhaps it is yet to plateau due to the nature of Pythons dynamic memory allocation?

from steam.client import SteamClient
from dota2.client import Dota2Client
import gevent

import time
import logging

logging.basicConfig(
    format='[%(asctime)s] %(levelname)s %(name)s: %(message)s', 
    level=logging.DEBUG
)

credentials = [
    ('username_1', 'credentials_1'), # 1st tuple of credentials
    ...
    ('username_5', 'credentials_5')
]

class DebugDotaLobbyManager:

    def __init__(self):
        self.client = SteamClient()
        self.dota_client = Dota2Client(self.client)

    def login_and_create_lobby(self, username, password):

        # register event callback
        self.client.on('logged_on', self.__on_login)
        self.client.on('start_expire', self.__start_expire)
        self.dota_client.on('ready', self.__on_ready)

        # login with the provided credentials
        self.client.login(username, password)

        # start the lobby expire timer
        self.client.emit('start_expire')

    def __on_login(self):
        self.dota_client.launch()

    def __start_expire(self):
        """
        method to `expire` or logout the client
        and remove all listeners after some time

        1. sleep and yield client greenlet
        2. remove all listeners
        3. logout
        """

        # wait for some time before expiry
        self.client.sleep(10)

        # remove event listeners and callback
        self.dota_client.remove_all_listeners()
        self.client.remove_all_listeners()

        # logout client
        self.client.logout()

    def __on_ready(self):
        """
        on session ready,
        1. destroy existing lobby
        2. create new practice lobby
        """

        options = dict(
            game_name="some automated lobby",
        )
        # destroy existing lobby
        self.dota_client.destroy_lobby()

        # create practice lobby with the provided config
        self.dota_client.create_practice_lobby(
            options=options, 
            password="some password"
        )

for _ in range(20):

    for cred in credentials:

        print(f'Credentials: {cred}')

        lobby_manager = DebugDotaLobbyManager()
        lobby_manager.login_and_create_lobby(*cred)

        time.sleep(1)

    time.sleep(40)

memusage

rossengeorgiev commented 3 years ago

There a number of issues with that code.

  1. You are using time.sleep(), but not monkey patching it with gevent, which means the gevent loop will not run when sleep like that. Your clients are essentially paused. After that 40sec sleep is over, all the client would have been kicked off steam for not responding.

  2. Removing all listeners likely clears builtin listeners, which means SteamClient and DotaClient are likely broken. You don't want to do that. If you want stop a client, just call SteamClient.disconnect().

  3. if login_and_create_lobby() is called multiple times, then you are registering more than one instance of the handler. Handler registration should be handled in __init__().

  4. That memory profile is kind of useless. That amount of time is too short. Python generally doesn't release memory. You will have to read on python memory management in details. You can try running gc manually with gc.collect() or if you suspect object leakage over time keep track of counts with gc.get_count() (these will keep increasing until collect, but over long period it should stabilize unless you are creating more and more objects potentially leaking).

DEV-ONI commented 3 years ago

Ah I apologize. I made the code in a haste, but yes, I intended to use gevent.sleep() instead of time.sleep(). Please disregard that.

I just need to clarify that it does not represent my actual code, but I just put it together because I wasn't certain about what classified as a memory leak. I've run several lobbies with some participants thanks to your code, but never more than a few hours so I suppose I need to try that.

Also, as I mentioned above, I only added remove_all_listeners after seeing what I thought to be the accumulating memory in gevent-emitter's self.__callback, but before this I was just calling client.logout.

Regarding number 3, I intended to instantiate multiple lobby managers as I'm logging into 5 accounts and creating 5 lobbies in succession, and then logging out of each.

Thanks for your help. What would classify as a sufficient period of time to test for leaks? The longest I've run is an hour with 10 matches concurrently being started and stopped repeatedly. I've already tried calling gc.collect() with each iteration. Python generally reallocates memory after a while right? Should I expect to see it plateau eventually?

Will close the issue after this. Thanks again!