Open qgallouedec opened 1 year ago
Just so you know, I had in mind to implement rainbow: https://github.com/DLR-RM/stable-baselines3/issues/622 and make double dqn / dueling dqn special case of it. But so far I had no time, so I think it's good to have reference implementation here =)
btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176
Related: https://github.com/DLR-RM/stable-baselines3/issues/487
Just so you know, I had in mind to implement rainbow: https://github.com/DLR-RM/stable-baselines3/issues/622 and make double dqn / dueling dqn special case of it.
Agree, it would be much better
btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176
Thanks for the ref!
@araffin I want your opinion on this: Wouldn't it make more sense to modify DQN to make DuelingDQN a special case of DQN? Rather than rewriting DQN, as I am doing on this PR. Eventually, we would have the same structure as DDPG/TD3. Moreover, the code written would be more easily reusable for Rainbow.
e to modify DQN to make DuelingDQN a special case of DQN?
you mean DQN a special case of DuelingDQN? how would that look like?
Rather than rewriting DQN, as I am doing on this PR.
you are mainly changing the policy, no?
you are mainly changing the policy, no?
I'm changing only the policy.
you mean DQN a special case of DuelingDQN?
Yes, it makes more sense in that order. But this would require putting DuelingDQN in the main repo. That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW (in which all tricks except dueling are disabled), and DQN is another special case of RAINBOW (in which all tricks are disabled).
how would that look like?
I'll make a draft PR in SB3
That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW
yes, so for now, I would keep it as is. My goal with Vanilla DQN in SB3 was to keep it as simple as possible to be a reference when learning or implementing other things.
On second thought, it might be more practical to work in reverse. I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default. And at the renaming DQN to RAINBOW, changing the defaults. Roughly we would have:
Currently:
class DQN:
def __init__(self, policy, env, ...): ...
We implement dueling:
class DQN:
def __init__(self, policy, env, ..., dueling=False): ...
class DuelingDQN(DQN):
def __init__(self, policy, env, ...):
super().__init__(policy, env, ..., dueling=True)
Then we implement PER
class DQN:
def __init__(self, policy, env, ..., dueling=False, per=False): ...
class DuelingDQN(DQN):
def __init__(self, policy, env, ..., per=False):
super().__init__(policy, env, ..., dueling=True, per=per)
class PERDQN(DQN):
def __init__(self, policy, env, ..., dueling=False):
super().__init__(policy, env, ..., dueling=dueling, per=True)
And so on.
Once all the features have been implemented, we can rename DQN to RAINBOW, changing only the default values, and add a DQN class that inherits from RAINBOW:
class RAINBOW:
def __init__(self, policy, env, ..., dueling=True, per=True): ...
class DuelingDQN(RAINBOW):
def __init__(self, policy, env, ..., per=False):
super().__init__(policy, env, ..., dueling=True, per=per)
class PERDQN(RAINBOW):
def __init__(self, policy, env, ..., dueling=False):
super().__init__(policy, env, ..., dueling=dueling, per=True)
class DQN(RAINBOW):
def __init__(self, policy, env, ...):
super().__init__(policy, env, ..., dueling=False, per=False)
The main constraint is that when renaming DQN -> RAINBOW, and then creating a DQN class that inherits from RAINBOW, we introduce a breaking change, since arguments from DQN are removed (per, dueling, etc.). They would have to be deprecated first.
I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default.
why not implement all at once?
Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong). As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions).
And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN
, DuelingDQN
, ... would all derive from RAINBOW
but not DQN
.
What you propose is what I had in mind (so Dueling special case of RAINBOW) except for DQN.
why not implement all at once?
We could.
I don't know exactly what changes will be needed in the library to implement RAINBOW. But let's take PER as an example. I think it would be best to implement it as HER, i.e. as a subclass of ReplayBuffer
. To make PER compatible with all off-policy algorithms, we will have to find a way to update the weights after the TD error calculation. Probably we will have to add in all algorithms a call to the method self.replay_buffer.update_weight(td_errors)
or (or self.replay_buffer.update_weight(idxs, td_errors)
, depending on how we go about it), and thus add this method to the parent class ReplayBuffer
.
All in all, it shouldn't be complicated but it concerns a lot of files, so I think it would be much easier to work on it on a dedicated PR.
I don't know if the same reasoning applies to other rainbow features.
Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong).
Right now I imagine that it should not be complicated, but I may be wrong, I would have to look into it in detail.
As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so
PERDQN
,DuelingDQN
, ... would all derive fromRAINBOW
but notDQN
.
So DQN would be a special case for which we would keep a readable implementation by not deriving it from RAINBOW, right? It seems legitimate.
Description
TODOs
Naming convention
Method : Dueling DQN Class name : DuelingDQN Lowercase : dueling_dqn branch : dueling-dqn
Context
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)Note: we are using a maximum length of 127 characters per line