Farama-Foundation / MO-Gymnasium

Multi-objective Gymnasium environments for reinforcement learning
http://mo-gymnasium.farama.org/
MIT License
254 stars 33 forks source link

The Pareto front generated by minecart is not optimal #78

Open wilrop opened 9 months ago

wilrop commented 9 months ago

I discovered that the Pareto front generated by the minecart environment is not optimal. I found this out by comparing the hypervolume of the Pareto front generated by GPI-LS with the one included in the environment itself. If the Pareto front generated by the environment was optimal, its hypervolume would be greater than or equal to the one resulting from GPI-LS. Instead, I observed that the hypervolume obtained by GPI-LS is greater.

Here is some code to reproduce it. I downloaded the Pareto front generated by the following run of GPI-LS: https://wandb.ai/openrlbenchmark/MORL-Baselines/runs/y6s3uaty Note that the same gamma is used in both solution sets, so the comparison is valid.

import pygmo as pg
import pandas as pd
import mo_gymnasium as mo_gym

env = mo_gym.make("minecart-v0")
pf = np.array(env.unwrapped.pareto_front(gamma=0.98, symmetric=True))

ref_point = np.array([-1, -1, -200])
hv_pf = pg.hypervolume(-pf).compute(-ref_point)

print("Minecart")
print(f"True PF HV: {hv_pf}")
print("----------")

FILE_NAME = "GPI_LS_front.csv"  # Put the filename of the CSV here.
found_vecs = pd.read_csv(FILE_NAME).to_numpy()
hv_pf = pg.hypervolume(-found_vecs).compute(-ref_point)
print(f'GPI-LS PF HV: {hv_pf}')

I think it would be worthwhile to also verify whether the convex hull generated by minecart is correct, but I did not check this myself.

wilrop commented 9 months ago

Here is the file itself as well: GPI_LS_front.csv

LucasAlegre commented 9 months ago

image

It seems that the CCS (convex hull of the return PF here) underestimates the amount of ores that can be collected. The code for computing the Pareto Front (https://github.com/Farama-Foundation/MO-Gymnasium/blob/main/mo_gymnasium/envs/minecart/minecart.py#L218) is quite complex.

I believe it is supposed to be an approximation and not an exact estimate. Maybe we should add a warning explaining this.

Maybe @axelabels (who wrote this method) can help?