LucasAlegre / morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.
https://lucasalegre.github.io/morl-baselines
MIT License
319 stars 49 forks source link

Implementation of MOPPO fails #102

Closed thomaslautenb closed 3 months ago

thomaslautenb commented 5 months ago

I encountered a Problem with implementation of the MO_PPO. There was a mismatch in dimensions of the _reward vector in the sync_vector_env environment.

I worked around the issues by extending the dimension of the reward vector by the number of objectives -> reward_dim. (its hardcoded and not pretty)

You might want to have a look on this.

I can provide more extensive report on the issue if required.

def step_wait(self) -> Tuple[Any, NDArray[Any], NDArray[Any], NDArray[Any], dict]:
        """Steps through each of the environments returning the batched results.

        Returns:
            The batched environment step results
        """
        **reward_dim = 2
        self._rewards = np.zeros((self.num_envs, reward_dim), dtype=np.float64)**
        observations, infos = [], {}
        for i, (env, action) in enumerate(zip(self.envs, self._actions)):
            (
                observation,
                self._rewards[i],
                self._terminateds[i],
                self._truncateds[i],
                info,
            ) = env.step(action)

            if self._terminateds[i] or self._truncateds[i]:
                old_observation, old_info = observation, info
                observation, info = env.reset()
                info["final_observation"] = old_observation
                info["final_info"] = old_info
            observations.append(observation)
            infos = self._add_info(infos, info, i)
        self.observations = concatenate(
            self.single_observation_space, observations, self.observations
        )

        return (
            deepcopy(self.observations) if self.copy else self.observations,
            np.copy(self._rewards),
            np.copy(self._terminateds),
            np.copy(self._truncateds),
            infos,
        )
ffelten commented 5 months ago

Hi @thomaslautenb,

Thanks for reporting this.

Which version of Gymnasium are you using? The vector wrapper heavily relies on Gymnasium (and it changed recently in Gymnasium). The last time I ran that code it worked, it was (I think) with Gymnasium 0.28.1.

Now, for the longer term, we are revamping the wrapper implementations to match Gymnasium 1.0. The PR is here: https://github.com/Farama-Foundation/MO-Gymnasium/pull/95 it would be great if we could validate that the new vector wrappers work with MORL-Baselines.

I don't have time to finish this month as I'm defending my thesis and moving abroad but it is on my todo for this summer. If you have time, you could also try it on your own by working from the current PR on MO-Gymnasium and Gymnasium 1.0 :-).

ffelten commented 3 months ago

Hi @thomaslautenb,

First, the PR for the migration to 1.0 is moving forward, see #109. Second, I've talked with different people running PGMORL these days and they did not experience any issues. Could you tell me if it is still buggy?

thomaslautenb commented 3 months ago

Hi @ffelten excuse me for not getting back to you. I was indeed using an older gymnasium version. Running with 0.28.1 worked. You mgiht close the issue!

Thanks for the feedback!