DuelingDQN - Githubissues

qgallouedec commented 1 year ago

Description

TODOs

[ ] Make learning curves
[ ] Complete DuelingDQN short presentation
[ ] Fill the result tabular
[ ] Results validation against another implementation
[ ] Create a branch in rl-zoo3
[ ] Check if that command lines in the doc work

Naming convention

Method : Dueling DQN Class name : DuelingDQN Lowercase : dueling_dqn branch : dueling-dqn

Context

[ ] I have raised an issue to propose this change (required)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist:

[ ] I've read the CONTRIBUTION guide (required)
[ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have included an example of using the feature (required for new features).
[ ] I have included baseline results (required for new training algorithms or training-related features).
[ ] I have updated the documentation accordingly.
[ ] I have updated the changelog accordingly (required).
[ ] I have reformatted the code using make format (required)
[ ] I have checked the codestyle using make check-codestyle and make lint (required)
[ ] I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

araffin commented 1 year ago

Just so you know, I had in mind to implement rainbow: https://github.com/DLR-RM/stable-baselines3/issues/622 and make double dqn / dueling dqn special case of it. But so far I had no time, so I think it's good to have reference implementation here =)

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

qgallouedec commented 1 year ago

Just so you know, I had in mind to implement rainbow: https://github.com/DLR-RM/stable-baselines3/issues/622 and make double dqn / dueling dqn special case of it.

Agree, it would be much better

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

Thanks for the ref!

qgallouedec commented 1 year ago

@araffin I want your opinion on this: Wouldn't it make more sense to modify DQN to make DuelingDQN a special case of DQN? Rather than rewriting DQN, as I am doing on this PR. Eventually, we would have the same structure as DDPG/TD3. Moreover, the code written would be more easily reusable for Rainbow.

araffin commented 1 year ago

e to modify DQN to make DuelingDQN a special case of DQN?

you mean DQN a special case of DuelingDQN? how would that look like?

Rather than rewriting DQN, as I am doing on this PR.

you are mainly changing the policy, no?

qgallouedec commented 1 year ago

you are mainly changing the policy, no?

I'm changing only the policy.

qgallouedec commented 1 year ago

you mean DQN a special case of DuelingDQN?

Yes, it makes more sense in that order. But this would require putting DuelingDQN in the main repo. That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW (in which all tricks except dueling are disabled), and DQN is another special case of RAINBOW (in which all tricks are disabled).

how would that look like?

I'll make a draft PR in SB3

araffin commented 1 year ago

That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW

yes, so for now, I would keep it as is. My goal with Vanilla DQN in SB3 was to keep it as simple as possible to be a reference when learning or implementing other things.

qgallouedec commented 1 year ago

On second thought, it might be more practical to work in reverse. I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default. And at the renaming DQN to RAINBOW, changing the defaults. Roughly we would have:

Currently:

class DQN:
    def __init__(self, policy, env, ...): ...

We implement dueling:

class DQN:
    def __init__(self, policy, env, ..., dueling=False): ...

class DuelingDQN(DQN):
    def __init__(self, policy, env, ...):
        super().__init__(policy, env, ..., dueling=True)

Then we implement PER

class DQN:
    def __init__(self, policy, env, ..., dueling=False, per=False): ...

class DuelingDQN(DQN):
    def __init__(self, policy, env, ..., per=False):
        super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(DQN):
    def __init__(self, policy, env, ..., dueling=False):
        super().__init__(policy, env, ..., dueling=dueling, per=True)

And so on.

Once all the features have been implemented, we can rename DQN to RAINBOW, changing only the default values, and add a DQN class that inherits from RAINBOW:

class RAINBOW:
    def __init__(self, policy, env, ..., dueling=True, per=True):  ...

class DuelingDQN(RAINBOW):
    def __init__(self, policy, env, ..., per=False):
        super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(RAINBOW):
    def __init__(self, policy, env, ..., dueling=False):
        super().__init__(policy, env, ..., dueling=dueling, per=True)

class DQN(RAINBOW):
    def __init__(self, policy, env, ...):
        super().__init__(policy, env, ..., dueling=False, per=False)

The main constraint is that when renaming DQN -> RAINBOW, and then creating a DQN class that inherits from RAINBOW, we introduce a breaking change, since arguments from DQN are removed (per, dueling, etc.). They would have to be deprecated first.

araffin commented 1 year ago

I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default.

why not implement all at once?

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong). As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

What you propose is what I had in mind (so Dueling special case of RAINBOW) except for DQN.

qgallouedec commented 1 year ago

why not implement all at once?

We could. I don't know exactly what changes will be needed in the library to implement RAINBOW. But let's take PER as an example. I think it would be best to implement it as HER, i.e. as a subclass of ReplayBuffer. To make PER compatible with all off-policy algorithms, we will have to find a way to update the weights after the TD error calculation. Probably we will have to add in all algorithms a call to the method self.replay_buffer.update_weight(td_errors) or (or self.replay_buffer.update_weight(idxs, td_errors), depending on how we go about it), and thus add this method to the parent class ReplayBuffer. All in all, it shouldn't be complicated but it concerns a lot of files, so I think it would be much easier to work on it on a dedicated PR.

I don't know if the same reasoning applies to other rainbow features.

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong).

Right now I imagine that it should not be complicated, but I may be wrong, I would have to look into it in detail.

As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

So DQN would be a special case for which we would keep a readable implementation by not deriving it from RAINBOW, right? It seems legitimate.

Stable-Baselines-Team / stable-baselines3-contrib

DuelingDQN #127

Description

TODOs

Context

Types of changes

Checklist: