LARG / HFO

Half Field Offense in Robocup 2D Soccer
MIT License
232 stars 93 forks source link

Re-start hfo server #75

Closed andreaBassichUoY closed 5 years ago

andreaBassichUoY commented 5 years ago

Hi @mhauskn,

I'm using this environment in Python through the hfo_py library, and I need to dynamically change some parameters, such as number of defenders etc, while staying in the same session. I tried to kill the server process and open a new server with different parameters, however i get the following error:

hfo_py/src/strategy.cpp 198: already initialized. ERROR Failed to read team strategy. Init failed

This happens even when i manually reload the whole hfo_py module. Is there a way to modify the parameters of the environment without re-starting the server? If i were to re-start the server what would I need to do in order to avoid this exception?

Cheers

mhauskn commented 5 years ago

Hi @andreaBassich, as you might imagine HFO was not designed with dynamic reloading of parameters in mind. This is because the rcssserver itself is not designed for reloading parameters. However, it may be possible to hack a solution. Can you provide an example to replicate the error you're getting?

andreaBassichUoY commented 5 years ago

In the class soccer_env.py I added a method to try to restart the server

from importlib import reload

def _restartServer(self):
    self.env.act(hfo_py.QUIT)
    self.env.step()
    os.system("killall -9 rcssserver")
    reload(hfo_py)
    self.env = hfo_py.HFOEnvironment()
    self._start_hfo_server()
    time.sleep(1)
    self.env.connectToServer(hfo_py.HIGH_LEVEL_FEATURE_SET, config_dir=hfo_py.get_config_path())`

The error arises when the last line of the method is executed, which surprises me as I assumed that by killing the server the formations wouldn't be already initialised.

Thanks for your help

mhauskn commented 5 years ago

I believe the formations are loaded by the agents, rather than the server. So it may be useful to kill the agent's processes as well. This should be taken care of by the cleanup function which should be called when the hfo_environment is garbage collected (https://github.com/LARG/HFO/blob/master/bin/HFO#L21).

It seems pretty weird - sort of like the module is not being properly reloaded. 1) Can you try with just a single offense agent to simplify things? 2) Can you send full stack trace?

andreaBassichUoY commented 5 years ago

I cant seem to get the stack trace for this specific error, I tried with traceback i.e.

try:
    self.env.connectToServer(hfo_py.HIGH_LEVEL_FEATURE_SET, config_dir=hfo_py.get_config_path())
except Exception:
    traceback.print_exc()

but it doesn't get to the catch clause. By debugging I could follow it as far as line 138 in hfo.py:

hfo_lib.connectToServer(self.obj,
                            feature_set,
                            config_dir.encode('utf-8'),
                            server_port,server_addr.encode('utf-8'),
                            team_name.encode('utf-8'),
                            play_goalie,
                            record_dir.encode('utf-8'))

After which it crashes, I'm guessing this ends up calling line 32 in HFO.cpp. As a side note for testing purposes I was using only one agent.

I found it weird as well, as I thought that by reloading the whole module there wouldn't be any problems. Also you're right, the problem should be on the agent's side, as even if I don't start a new server, I get the same error instead of [ConnectToServer] Server Down!, so it definitely doesn't get to line 63 in HFO.cpp.

andreaBassichUoY commented 5 years ago

Hi @mhauskn, are there any new developments regarding this issue?

mhauskn commented 5 years ago

Sorry, no updates from my end. I'd be happy to accept a PR if you find a way to add this functionality.

andreaBassichUoY commented 5 years ago

HI @mhauskn, after a bit of debugging I found out that in order to properly restart the server it's necessary to not have the main process initialise/connect to the server, but that part has to be done through another process. This allows for this process to be killed whenever the server is re-started, hence avoiding the error mentioned above. In the end it wasn't anything within the library itself.

Cheers

mhauskn commented 5 years ago

Great to hear that you found a workaround!

Amrit-pal-Singh commented 3 years ago

Hi @andreaBassich. Can you please elaborate about how you initiated the server in another process, and what was that another process.

andreaBassichUoY commented 3 years ago

Hi @Amrit-pal-Singh,

Digging through my old code I found the way I initialised the server process, hopefully, this will be helpful with your current issue.

    async def _start_server_as(self, get_port=False):
        cmd = hfo_py.get_hfo_path()
        for k in self._params.keys():
            v = self._params[k]
            if isinstance(v, bool):
                if v:
                    cmd += ' ' + k
            else:
                cmd += ' ' + k + '=' + str(v)

        successful = False

        while not successful:
            if get_port:
                self._port = self._get_free_port()
            cmd1 = cmd + ' --port=' + str(self._port)
            self._server_proc = await asyncio.create_subprocess_exec(*cmd1.split(), stdout=subprocess.PIPE,
                                                                     stderr=subprocess.STDOUT)
            while True:
                try:
                    line = await asyncio.wait_for(self._server_proc.stdout.readline(), 1)
                except:
                    # The call timed out
                    self._server_proc.terminate()
                    for i in range(len(self._players)):
                        self._send(i, ('_close', []))
                    self._players = []
                    break
                else:
                    if not line:  # EOF
                        self._server_proc.terminate()
                        for i in range(len(self._players)):
                            self._send(i, ('_close', []))
                        self._players = []
                        break
                    else:
                        line = str(line)
                        if 'Waiting for player-controlled agent' in line:
                            message = line.split()
                            in_q = Queue()
                            out_q = Queue()
                            Player(
                                feature_set=self._feature_set,
                                name=message[4][:-1],
                                config_dir=message[5].split('=')[1][:-1],
                                server_port=int(message[6].split('=')[1][:-1]),
                                server_addr=message[7].split('=')[1][:-1],
                                team_name=message[8].split('=')[1][:-1],
                                play_goalie=False,
                                in_q=out_q,
                                out_q=in_q,
                            )
                            self._players.append((out_q, in_q))
                            self._send(-1, ('_connect_to_server', []))
                        elif 'Starting game' in line:
                            successful = True
                            break
                        elif 'killall' in line:
                            self._server_proc.terminate()
                            for i in range(len(self._players)):
                                self._send(i, ('_close', []))
                            self._players = []
                            break
                        else:
                            pass
                            # For debugging you can print(line)
                        continue  # While some criterium is satisfied

As you can see this was part of a class where the parameters are stored as a dictionary and self._get_free_port() returns the number of a port that is currently not being used.

Cheers,

Andrea

Amrit-pal-Singh commented 3 years ago

Thank you @andreaBassich for the code. Actually I'm trying to run this code: https://github.com/f-leno/AdHoc_AAMAS-17

They are creating new thread for every agent and then initializing connection to HFO in that thread. As in your code you are terminating the process in this

  self._server_proc.terminate()
  for i in range(len(self._players)):
      self._send(i, ('_close', []))
  self._players = []
  break

but I guess the thread should stop when the program is terminating, so the problem should not arise. Can you please tell if there is any issue in this approach?

And if you have ever worked on this code in the past, it would be great help!!

andreaBassichUoY commented 3 years ago

In my code I am doing something similar, I put the Player class referenced in the code above at the end of this post if you want to have a look. My code in particular is made so that the server can be re-started with different parameters, that's why I have it set up that way.

I haven't worked on this repo in the past so unfortunately can't give you any tips on that.

class Player:

    def __init__(self, feature_set, name, config_dir, server_port, server_addr, team_name, play_goalie, in_q, out_q):
        self._feature_set = feature_set
        self._name = name
        self._config_dir = config_dir
        self._server_port = server_port
        self._server_addr = server_addr
        self._team_name = team_name
        self._play_goalie = play_goalie
        self._env = hfo.HFOEnvironment()
        self._in_q = in_q
        self._out_q = out_q
        self._done = False
        self._thread = threading.Thread(target=self._execute)
        self._thread.start()
        self._teammate_number = 0
        self._can_kick = False

    def _execute(self):
        while not self._done:
            method, args = self._in_q.get()
            res = self.__getattribute__(method)(*args)
            if res is not None:
                self._out_q.put(res)

    def _connect_to_server(self):
        self._env.connectToServer(
            feature_set=self._feature_set,
            config_dir=self._config_dir,
            server_port=self._server_port,
            server_addr=self._server_addr,
            team_name=self._team_name,
            play_goalie=self._play_goalie,
        )
        self.status = hfo_py.IN_GAME

    def _step(self, action):
        action_type = ACTIONS[action]
        if action_type == hfo_py.SHOOT or action_type == hfo_py.PASS or action_type == hfo_py.DRIBBLE:
            if self._can_kick:
                if action_type == hfo_py.PASS:
                    self._env.act(action_type, self._teammate_number)
                else:
                    self._env.act(action_type)
            else:
                self._env.act(hfo_py.NOOP)
        else:
            self._env.act(action_type)
        self.status = self._env.step()
        return self._getState(), self._get_reward(), self.status

    def _getState(self):
        state = self._env.getState()
        if state[15] != -2:
            self._teammate_number = state[15]
        self._can_kick = state[5] == 1
        return state

    def _reset(self):
        while self.status == hfo_py.IN_GAME:
            self._env.act(hfo_py.NOOP)
            self.status = self._env.step()
        return self._getState()

    def _get_reward(self):
        if self.status == hfo_py.GOAL:
            return 1
        if self.status == hfo_py.CAPTURED_BY_DEFENSE:
            return -1
        if self.status == hfo_py.OUT_OF_BOUNDS:
            return -1
        if self.status == hfo_py.OUT_OF_TIME:
            return 0
        return 0

    def _close(self):
        self._env.act(hfo_py.QUIT)
        self._env.step()
        self._done = True
Amrit-pal-Singh commented 3 years ago

Thank you @andreaBassich. Your code helped to resolve my issue and I was able to run using threads.

torressliu commented 2 years ago

Hi @andreaBassichUoY I am wrapping the HFO to a Multi-agents environment( as a GYM kind). I use multi threads to realize multiple agents to connect to the server at the same time. However, the server always down. Could you tell me how can you restart the server? How to " not have the main process initialise/connect to the server "