LucasAlegre / morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.
https://lucasalegre.github.io/morl-baselines
MIT License
271 stars 44 forks source link

why do the GPI-LS and GPI-PD run for a very long time? #100

Closed yyicc1108 closed 3 months ago

yyicc1108 commented 4 months ago

Hi Dears,

I would like to say thanks for your contributions to maintaining the code library, which helped me to understand MORLs.

When I try to run the GPI-LS and GPI-PD on the env namely Deep Sea Treasure, I find the running time is very long with total timestep=1e5. It takes 7-8 hours to run on my PC with a single NVIDIA A30 GPU, is this common case?

Looking forward to your reply, and thanks again.

Thanks, Tianyang

ffelten commented 4 months ago

Hi,

First, it is always a good idea to browse a bit our Open RL Benchmark repo: https://wandb.ai/openrlbenchmark/MORL-Baselines/workspace?nw=nwuserflorianfelten to get a feel of what we had when we wrote the paper.

Second, do you mean the GPI-LS and PD with neural net, e.g. this https://github.com/LucasAlegre/morl-baselines/blob/main/examples/gpi_pd_minecart.py

or tabular (e.g. MPMOQL with GPI): https://github.com/LucasAlegre/morl-baselines/blob/main/examples/mp_mo_q_learning_DST.py

?

LucasAlegre commented 4 months ago

Hi @yyaicc,

We are glad our library is being useful!

First, notice that for tabular/discrrete environments, you can employ a tabular version of GPI-LS and GPI-PD by passing the correct parameters to: (https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/multi_policy_moqlearning/mp_mo_q_learning.py)

For the function approximation case with neural networks, the reason for the high training time is the fact that GPI-LS uses techniques such as DroQ (https://arxiv.org/pdf/2110.02034), which performs many gradient steps per time step to increase sample efficiency. If you care more about training time than sample efficiency, you can run the algorithm with a smaller value for gradient_updates. If you set it to 1, for instance, it should run ~20x faster than with the default value of 20.

The good news is that I'm currently working on a JAX version of GPI-LS which runs way faster than the PytTorch implementation, and should be added to the library is the near future.

ffelten commented 3 months ago

@yyicc1108 any update on this?

yyicc1108 commented 3 months ago

No, thanks very much for your reply.  

李田洋 @.***

 

------------------ 原始邮件 ------------------ 发件人: "Florian @.>; 发送时间: 2024年5月22日(星期三) 下午5:20 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [LucasAlegre/morl-baselines] why do the GPI-LS and GPI-PD run for a very long time? (Issue #100)

@yyicc1108 any update on this?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>